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Introduction 


1.1 Overview 


The MIT Laboratory for Computer Science (LCS) is an interdepartmental laboratory whose 
principal goal is research in computer science and engineering. 


In 1963, when the Laboratory was founded as Project MAC, it explored and developed one 
of the world’s earliest timeshared computer systems. This 1960’s research on the Compatible 
Time Sharing System (CTSS), and its successor MULTICS, contributed innovations like the 
writing of operating systems in high level programming languages, virtual memory, tree 
directories, online scheduling algorithms, line and page editors, secure operating systems, 
concepts and techniques for access control, computer aided design, and two of the earliest 
computer games—space wars and computer chess. 


These early developments laid the foundations for the Laboratory’s work in the 1970’s on 
knowledge based systems, for example, the MACSYMA program for symbolic mathematics, 
on natural language understanding; and in the development (with BBN) and use of packet 
networks. In the 1970’s, the Laboratory also developed theoretical results in complexity 
theory and linked cryptography to computer science through concepts and algorithms for 
public encryption (RSA). In the late 1970s, Project MAC, renamed as the Laboratory for 
Computer Science (LCS), embarked on research in such areas as clinical decision making, 
in the exploration of cellular automata at the borderline between physics and computation, 
and on the social impact of computers. At the same time, the Laboratory began two major 
research programs in distributed systems and languages, and in parallel systems. These led 
to the notions of data abstractions and the Clu language; the Argus distributed system; the 
dataflow principle and associated languages, and architectures of parallel systems; local area 
ring networks; to program specification; and workstation development, where the Laboratory 
contributed the earliest UNiX ports and compilers and the NuBus architecture, now used in 
commercial computers like Apple’s Macintosh H. 


The Laboratory’s current research falls into four principal categories, Parallel Systems; Sys- 
tems, Languages, and Networks; Intelligent Systems; and Theory. The principal technical 
goals and expected consequences in each of these four categories are as follows: 


In Parallel Systems, we strive to harness the power and economy of numerous processors 
working on the same task. Research in the area involves the analysis and construction 
of various hardware architectures and programming languages that yield, over a broad set 
of applications, cost-performance improvements of several orders of magnitude relative te 
single processors. This research is expected to affect most of tomorrow’s machines which we 
expect to be of the multiprocessor variety—not only because of potential cost performance 
benefits but also because of the natural, yet unexploited, concurrence that characterizes 
contemporary and prospective applications from business to sensory computing. 


In Systems, Languages, and Networks, our objective is to provide the concepts, methods, 
and environments that will enable heterogeneous computers, each working on different tasks, 
to communicate efficiently, conveniently, and reliably with one another in order to exchange 
information needed and supplied by their respective programs. Such communication may 
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involve, beyond conventional electronic mail and file transfer, the calling of programs in one 
environment from programs in another, perhaps different, environment and the sharing of 
structured data among such programs. This research is also expected to have a broad impact 
on future systems, since virtually every machine will be connected to a network. 


Taken together, these two thrusts in parallel and networked machines signal our expectation 
that future computer systems will consist of multiprocesscrs interconnected by local and 
long haul networks, and perhaps someday by national] network infrastructures as ubiquitous 
and as important as today’s telephone and highway infrastructures. 


In the Intelligent Systems area, our technical goals are to understand and construct programs 
and machines that have greater and more useful sensory and cognitive capabilities. Examples 
include the understanding of spoken messages, systems that can learn from practice rather 
than by being explicitly programmed, and programs that reason about clinical issues and 
help in clinical decision making. We expect tomorrow’s intelligent systems to be easier to 
use than today’s programs across a broad front of applications. 


In our fourth category of research, Theory, we strive to understand and discover the fun- 
damental forces, rules, and limits of computer science. Theoretical work permeates many 
of our research efforts in the other three areas, for example, in the pursuit of parallel al- 
gorithms and in the study of fundamental properties of idealized parallel architectures and 
computer networks. Theory also touches on several predominantly abstract areas, like the 
logic of programs, the inherent complexity of computations, and the use of cryptography 
and randomness to the formal characterization of knowledge. The impact of theoretical 
computer science upon our world is expected to continue its past record of improving our 
understanding and helping us pursue new frontiers with new models, concepts, methods, 
and algorithms. 


1.2 Highlights of the Year 


The year 1988 marked the 25!* Anniversary of our Laboratory. The occasion was celebrated 
with a two day symposium on current research for an international audience of some 1000 
people, and a testimonial banquet attended by over 1200 members and guests of both the LCS 
and AI laboratories. Chaired by Professo: Albert R. Meyer, the celebration was memorable 
and successful. 


Research highlights during the reporting period were as follows: 


1. Dr. Victor Zue and }’s research group moved from MIT’s Research Laboratory of Elec- 
tronics to LCS. This move promises to be significant for both the speech research effort 
and for other LCS groups through the potential synergism between speech research and 
computer architecture. 


2. We corcluded a major agreement with Motorola to build the Dataflow Machine, con- 
ceived and designed by the Computation Structures Group. This effort is significant for 
it represents the first major test of this new architecture invented by our Laboratory. 
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3. A new group, the Computer Architecture Group, was formed. It includes Professors 
Agarwal, Dally, and Ward, their students and staff. This large architectural group of 
computer architects plans to embark on a new project, NuMesh. The NuMesh involves 
an interconnection and intercommunication standard that goes beyond the notion of 
a computer bus to three dimensional structures. As currently envisioned, the NuMesh 
will consist of small cubes (about 2cm on the side) which will be able to plug together 
with other similar cubes at all six of their sides. Each cube will contain processing and 
communication chips. We envision that users of this technology will first construct in 
Tinkertoy fashion a special purpose aggregate that is best suited to their problems, 
and will then run these problems on the so-constructed machine. This approach is 
thus expected to make possible the benefits of special purpose computation out of 
general purpose subsystems. We are planning to fund this research out of an industrial 
consortium of manufacturers. 


During 1988-59, the Laboratory continued its successful Distinguished Lecturer Series with 
presentations by David L. Parnas, Professor of Computing and Infcr.nation Science, Queen’s 
University; David S$. Johnson, Department Head, Mathematical Foundations of Comput- 
ing, AT&T Bell Laboratories; Raj Reddy, Director of Robotics Institute, Carnegie Mellon 
University; and Robert W. Taylor, Director, Systems Research Center, Digital Equipmext 
Corporation. 


During this reporting period, Professor David L. Tennenhouse joined the Advanced Network 
Architecture Group; Drs. Gregory Papadopoulos and Gill Pratt became Research Associates 
in the Computation Structures and the newly formed Computer Architectures Group, re- 
spectively. Dr. Victor Zue and his Spoken Language Systems Group, including four research 
scientists, 15 students, two visitors, and two support staff also joined the Laboratory. Two 
staff accountants, Ms. Azi Djazani and Mr. David Ruble joined the LCS administrative 
staff and Ms. Mary Mitchell joined MIT and LCS Administrative Officer of the Laboratory. 


The Laboratory is organized into 18 research groups, an administrative unit, and a computer 
service support unit. The Laboratory’s membership includes a total of 350 people—105 
faculty and research staff, 35 visitors, affiliates, and postdoctoral associates, 30 support staff, 
125 graduate students, and 55 undergraduate students. The academic affiliation of most of 
the Laboratory’s faculty and students is with the Department of Electrical Engineering 
and Computer Science (EECS). The funding is predominantly from the U.S. Government’s 
Defense Advanced Research Projects Agency, which accounts for about half of the total. 
The Laboratory is also funded by and has extensive links with industrial organizations. 
These include partnerships for the construction of major hardware systems, consortia for the 
development and maintenance of standards, like X Windows, and joint studies on research 
areas of common concern. Technical results of our research in 1988-89 were disseminated 
through publications in the technical literature, through Technical Report numbers 426 
through 453, and Technical Memoranda numbers 363 through 400. 
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2.1 Introduction 


The Advanced Network Architecture project continues to explore a number of problems 
related to the design of advanced data networks. As networks get bigger and faster, it is 
important to explore new design approaches, since the current assumptions and protocols 
may not scale well to match the expectations of tomorrow. 


The central problem of our group has been the management of resources within the network: 
bandwidth, switching capacity, and buffering. If we are to achieve higher speeds and larger 
size, the tradeoffs among these resources must change, and new algorithms and approaches 
will be needed. 


In the following sections, a number of specific projects related to this ov :all goal are de- 
scribed. 


2.2 Network Control Algorithms 


Lixia Zhang has nearly completed work on a new network architecture which integrates 
resource management and traffic control into the system. The new architecture can support 
a wide variety of applications and incorporate new technologies. ït includes three basic 
parts: an elemental data transmission entity called a flow, an interface between users and 
the network which allows a flow to specify a set of performance attributes, and a distributed 
control algorithm that regulates network traffic to ensure overall performance. 


2.3 Rate-based Flow Control 


Previously, several members of the group designed a new transport protocol, NETBLT, 
which contained novel algorithms for flow control and error recovery [79]. In particular, the 
protocol contained a flow control algorithm based on rate regulation, rather than window 
permissions. Rate control is expected to provide smoother and more effective utilization of 
high bandwidth, long delay links. 


The first version of that protocol did not have an algorithm for dynamic adjustment of the 
rate, but instead used manual adjustment. While manual adjustment was sufficient for a 
first set ot experiments, it was not the basis for a practical system. Subsequently, Mark 
Lambert proposed a number of dynamic rate adjustment algorithms. 


Heimut Rebstock, a visiting scientist from Siemens Corporation, assisted by James Davin, 
used the interactive network simulator to explore the behavior of these proposed algorithms 
for dynamic adjustment of transmission rates in NETBLT. Rebstock began a study of the 
capacity of adaptive variants of NETBLT to support equitable sharing of bandwidth among 
connections in a congested network. 
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Andrew Heybey performed experiments to extend the results reported previously in his 
undergraduate thesis [161]. A form of slow start was added to Mosely’s rate-based protocol 
[239] and was shown (through simulation) to increase the percentage of the link capacity 
that can be used. The most important result is the observation that the protocol (with or 
without slow start) operates in either a stable or unstable region. As the fraction of the 
link’s capacity that the protocol attempts to use is increased, the queue lengths abruptly 
change from an average of approximately five to wild oscillation between zero and several 
thousand. (In these experiments, there is no limit on queue length, and no packets are ever 
dropped). 


2.4 Fair Queueing in Gateways 


James Davin and Andrew Heybey experimented with a novel “Fair Share” queueing algu- 
rithm developed at Xerox PARC [94]. They simulated the described algorithm and verified 
its correct operation in several simple network topologies both with and without the presence 
of ill behaved users. In its simplest form, the algorithm enforces fairness—no user may use 
more than its fair share of the output bandwidth. It can also be used to enforce policy by 
giving some users a larger share of the bandwidth than others. Because the present algo- 
tithm only comes into play when the output queue length in the switch is greater than zero, 
the extension of fair queueing for operation on an underutilized link is being contemplated. 
Packet discard strategies are also being studied. 


2.5 Protocol Performance Studies 


David Clark engaged in a study of TCP processing overhead that strongly suggests that the 
details of TCP are not a central issue in host level processing. The results of this study 
indicate that, with current RISC processors, the specific TCP processing steps would permit 
packet transmission at a large fraction of a gigabit per second. This work, performed jointly 
with V. Jacobson at LBL, H. Salwen at Proteon, Inc., and J. Romkey, is reported in [82]. 


Eman Hashem studied packet clustering in network environments similar to those of the In- 
ternet. This work aimed at analyzing the causes and consequences of packet aggregation and 
its effect on congestion. Two major causes were identified: the TCP slow start retransmission 
strategy, and the interaction of the gateway and TCP congestion control mechanisms. 


The slow start algorithm opens the TCP sender window exponentially by incrementing the 
window size by one for each acknowledgment received. This behavior leads to the clustering 
of the packets belonging to each TCP connection that is newly opened or is recovering from 
congestion. This phenomenon is not very harmful on its own, as the clustering persists only as 
long as the window is being opened. In a heavily loaded network, however, the TCP slow start 
algorithm and gateway congestion control schemes interact so as to increase the frequency 
of exponential window sizing. The gateway signals congestion by discarding packets in 
excess of its capacity. TCP responds to this signal by shutting its window and reopening it 


17 


Advanced Network Architecture 


exponentially. Following recovery from packet loss, the TCP continues to open the window 
linearly. Eventually, the previous level of congestion is again realized, and the recovery 
process is repeated. Thus, the network oscillates between congestion recovery and load 
optimization, causing packets from most connections to aggregate at the bottleneck resources 
during each congestion cycle. This global effect, involving all connections, coupled with local 
clustering of packets from the same connection, leads to a performance degradation. Long 
end-to-end delays result from the high queueing delays incurred at the bottleneck resources, 
and throughput decreases owing to bandwidth wasted in retransmissions. The extent of the 
packet clustering effects depends on the details of the congestion schemes and on how quickly 
they react to congestion. Although slow start encourages packet clustering, it minimizes 
wasted bandwidth by dynamically adjusting its window size to the current network load 
limit. 


Another approach to studying TCP behavior was pursued by Timothy Shepard. A system 
for collecting and storing about 12 hours of the protocol headers of all the packets on one 
of the main Ethernets in the Laboratory has been built and is now in continuous operation. 
The system has been useful as a stand-alone aid for debugging failures of the operational 
network. It provides easy access to packet traces for developers of experimental systems and 
protocols. The system is used mainly as a source of traces to support research in the analysis 
of TCP packet traces. 


A study in the analysis of TCP packet traces is in progress. This study explores the graphical 
presentation of packet traces to a human analyst and its effect on the human’s ability to 
absorb and understand a packet trace. 


2.6 Random Drop Queue Management 


The congestion control scheme currently employed in Internet gateways is a simple mech- 
anism that requires no information about the connections passing through each gateway. 
In the absence of such per-connection information, it is difficult to give the connections 
accurate signals in the event of congestion. One hypothesis advanced within the Internet 
Engineering ‘Task Force was that a slightly more intelligent, statistical mechanism might 
afford satisfactory results without the need for per-connection information. This mechanism 
was dubbed “random drop” because it entails discard of a randomly chosen packet from 
the bottleneck queue whenever congestion is detected. In theory, the random drop scheme 
discriminates against connections that consume an inordinate share of available bandwidth, 
for such connections are likely to have more packets in the queue than other connections, 
and the probability of packet loss for any one connection is proportional to the frequency of 
its representation in the bottleneck queue. On the strength of this analysis, random drop 
was expected to afford both service equity and improved aggregate performance. 


By simulating the random drop mechanism, Eman Hashem found that it does not perform as 
well as expected and that performance improvements over the current scheme are negligible. 
While random drop can improve the fairness of the gateway packet drop, it has no neces- 
sary effect upon the aggregate distribution of bandwidth in the network—which is largely 
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determined by the interaction between gateways and TCP congestion control schemes. Even 

though random drop penalizes the connections fairly for causing congestion, it can not con- 

trol their flow. Any connection that attempts to maximize its flow is rewarded with a higher 
- share of available bandwidth. For example, connections with short end-to-end delay react to 
the gateway congestion signal quickly and recover faster—thus realizing higher throughput 
by squeezing bandwidth away from longer delay connections. A similar bandwidth advan- 
tage is realized by misbehaving connections that use redundant transmissions to increase the 
probability of data reaching the destination in a minimum number of round trips. Thus, any 
TCP connection, well behaved or misbehaved, that has the ability to increase its flow above 
the other connections can achieve a higher bandwidth share even while employing random 
drop. Unfortunately, this behavior also degrades the performance of the other connections, 
for they spend much of their reduced bandwidth shares recovering from the congestion caused 
by the aggressive connections. 


One variation on random drop is to drop packets with some small probability before the 
buffer of the bottleneck resource is 100% full. In this way, the connections contributing 
most to congestion are afforded an early signal to slow down before gateway queues begin 
to overflow. This scheme, dubbed “early random drop,” may represent a profitable balance 
between dropping too early, effectively reducing the resource’s utilization, and dropping too 
late, achieving no improvement over the simple random drop. Early random drop is still 
under study with no significant advantages seen so far. 


2.7 Advanced Network Simulator 


Previously, Andrew Heybey and David Martin, with help from other members of the group, 
developed a network simulator to support the research described in the previous sections. The 
simulator uses the X Window System to display the state of the network as the simulation is 
running, and to allow the user to use the mouse to change the simulation parameters. Data 
produced by the simulation can also be logged to disk for post-processing. The simulator, 
by permitting a visual display of the network behavior as the simulation proceeds, permits 
a quick and intuitive understanding of complex network behavior. 


In this year, Andrew Heybey has improved the performance of the simulator by eliminating 
bottlenecks in the simulator’s X Window user interface code, and by using the ability of 
the GNU C compiler to compile functions inline to eliminate procedure call overhead where 
possible. A variety of bugs have been fixed, and the improvements have been released to 
other interested parties, for whom at least a minimal level of support is provided. The 
simulator is being actively used by people at Washington, Cray, Purdue and Mitre. 


2.8 Network Naming Services l 


Karen Sollins continued her work on providing directory services in the Internet. A directory 
service permits the location and identification of people, services, and resources in order to 
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access them. A number of directory services exist, so the focus of this work is to provide a 
framework for accessing a directory service that is general enough that most existing directory 
services can add a simple access veneer, while providing a rich communication protocol for 
requesting information from such a service. The framework permits directory services to 
name each other as well, in order to navigate through a set of such services. 


Work began this year on a written plan for developing and deploying directory services in 
the Internet, and that effort nears completion. 


Sollins organized a workshop to discuss white pages directory services in the Internet. This 
workshop met >t the Corporation for National Research Initiatives in Reston, Virginia, 
and involved participants from research and industry. In addition, Sollins participated in a 
workshop organized by NASA and DOE that addressed problems of naming entailed by a 
transition from the Internet to the OSI protocol suite. 


2.9 Policy Routing 


David Clark completed work on a proposal for policy routing in the Internet [81]. An 
integral component of the Internet protocols is the routing function, which determines the 
series of networks and gateways a packet will traverse in passing from the source to the 
destination. Although there have been a number of routing protocols used in the Internet, 
they share the idea that one route should be selected out of all available routes based on 
minimizing some measure of the route, such as delay. Recently, it has become important to 
select routes in order to restrict the use of network resources to certain classes of customers. 
These considerations, which are usually described as resource policies, are poorly enforced 
by the existing technology in the Internet. Clark proposes an approach to integrating policy 
controls into the Internet. 


The proposal models the resources of the Internet (networks, links, and gateways) as being 
partitioned into Administrative Regions or ARs. Each AR has a globally unique name and 
is governed by a somewhat autonomous administration having distinct goals as to the class 
of customers it intends to serve, the qualities of service it intends to deliver, and the means 
for recovering its cost. To construct a route across the Internet, a sequence of ARs must be 
selected that collectively supply a path frem the source to the destination. This sequence of 
ARs is called a Policy Route, or PR. Each AR through which a Policy Route passes will be 
concerned that the PR has been properly constructed, that is, each AR may wish to insure 
that the user of the PR is authorized, the requested quality of service is supported, and 
that the cost of the service can be recovered. Before a PR can be used, however, it must be 
reduced to more concrete terms: a series of gateways which connect the sequence of ARs. 
These gateways are called Policy Gateways. 


Clark’s proposal is designed to permit as wide a latitude as possible in the construction and 
enforcement of policies. In particular, no topological restrictions are assumed. In general, 
the approach is driven by the belief that, since policies reflect human concerns, the system 
should primarily be concerned with enforcement of policy, rather than synthesis of policy. 
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The proposal permits both end points and transit services to express and enforce local policy 
concerns. 


2.10 Byzantine Routing Algorithms 


Most dynamic network layer routing algorithms depend on the proper operation of all the 
routing nodes for their correct operation. If one node is corrupted, and for example asserts 
that it is the best route to all destinations, most routing algorithms fail to detect that this 
is an error. 


Radia Perlman completed her study [258] of Byzantine routing algorithms—routing algo- 
rithms that continue to operate correctly even if one or more routing nodes are corrupted in 
malicious ways. 


2.11 Network Management 


James Davin has continued his efforts to develop the Simple Network Management Protocol 
(SNMP) by participation in the relevant Internet Engineering Task Force working groups, 
by authoring documents that specify the protocol [73]{74][282] and explain its design [103], 
and by implementation. During this period, the MIT SNMP Development Kit software was 
developed and initially released. This software is a highly portable C language implementa- 
tion of the SNMP and has been ported to a variety of platforms. In particular, SNMP was 
implemented in the MIT C Gateway as part of this effort. 


2.12 Internet Architecture 


As part of our research effort, and in support of the ongoing extensions and changes to the 
Internet protocol suite, members of the group participated in a number of working groups, 
and contributed a number of design papers [80]. 


Internet Activities Board: David Clark continued to chair the Internet Activities Board, 
the steering board for the Internet protocol suite. He also attended the meetings of several 
working groups and task forces of the IAB. 


Inter-Autonomous System Routing Architecture: Lixia Zhang continued participa- 
tion in the Open-Routing Working Group under the Internet Engineering Task Force. 


Naming Services: As part of the work reported earlier on architectures for naming, Karen 
Sollins is a member of the Naming Task Force of the Distributed Systems Activities Board. 
She is also a member of the Autonomous Systems Task Force of the IAB. 
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3.1 Summary 


This year, we continued to work on the development of fundamental new methods for repre- 
senting medical knowledge and reasoning with it, exploration of means of integrating various 
results into coherent research systems, and formulating methods of merging decision analytic 
and AI reasoning. In addition, we are planning to use part of this year to summarize our 
accomplishments and experiences during the period of our grant, and to plan our future 
research. 


3.2 Plans 


3.2.1 Integration 


A few years ago, we hypothesized that the adoption of a uniform method of knowledge repre- 
sentation (based on contemporary developments in AI research) would give us an important 
advance in the ability to integrate various specific representation and reasoning techniques. 
We pursued this goal aggressively, but with considerably more frustration than we antici- 
pated. We reviewed some of our practical difficulties with this technology late last year [141] 
and have continued to explore the fundamental deficiencies in current AI representation 
approaches that have underlain those difficulties [96]. 


During the next year, we plan to attack these problems by taking both a theoretical and 
architectural approach to the problem of rational self-government, as defined by Jon Doyle. 
Architecturally, Doyle and Ramesh Patil are developing the principles and structures for 
knowledge bases which rationally manage their knowledge and inference methods. We ex- 
pect this organization for knowledge representation systems to have many advantages over 
the current crop of systems. Many current systems restrict the expressive power of their lan- 
guages in order to gain “efficiency,” but as our recent work has shown, this sort of “efficiency” 
defeats the most important uses of these systems. Other systems are more expressive, but 
are still limited by their inference methods, which apply mainly logical inference procedures 
in ways independent of the user’s goals. This also makes for inefficiency, as these systems 
sometimes prevent themselves from satisfying the user’s needs by wasting effort on inferences 
irrelevant to those needs. The goal of this study is to develop a set of architectural principles 
that permit design and development of new representation systems that explicitly take into 
account the objectives and constraints that they are to satisfy. 


Complementing this, Doyle and Michael Wellman! are studying how to achieve rationality in 
the process of developing and revising plans for large distributed activities. The main focus 
in this work will be ou developing techniques for rational distributed reason maintenance. 
All current reason maintenance systems carry out unbounded computations at each database 
cycle. Though they save effort over previous approaches to belief revision by only examining 
a portion of the knowledge base when effecting changes, that portion may be much or even 


Wellman is a former student, now collaborating from the Air Force’s AF WAL, Dayton, OH. 
g y 
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all of the knowledge base. Thus these systems are ill-suited for real time or rationally guided 
operations, as they provide no way for the user to control the effort spent on a revision 
or the distribution of effort over time. The aim of rational distributed reason maintenance 
is to make revision computations as local as possible, with pursuit of possible revisions 
across distances or different media under the influence of the goals and circumstances of the 
reasoner. In the theoretical work, Doyle plans to continue development of a mathematical 
theory of reason maintenance and rational self-government. In addition, he and Elisha Sacks? 
plan to investigate the applicability of more techniques of modern mathematics to qualitative 
reasoning about physical systems. 


In addition to such fundamental work on knowledge representation issues, we are also plan- 
ning to develop representation schemes at a higher, more specifically medically-relevant level 
of detail. Inspired by the qualitative probabilistic network representation pioneered by Well- 
man (319! Tze-Yun Leong is developing a taxunomy of tie structural aspects of a decision 
problem. The intent is to represent such concepts as clinical contexts, classes of therapeutic 
interventions, causality, and dependency. By gaining insights into the structure of clinical 
decisions, this exercise serves as a step toward realizing the uniform knowledge representa- 
tion language for an integrated artificial intelligence and decision analysis system for medical 
reasoning. 


This work will produce an appropriate representation for the formulation of decision prob- 
lems according to classical and recent models of decision making, such as qualitative or 
quantitative probabilistic networks, influence diagrams, or decision trees. We plan to pursue 
this work in the domain of pulmonary infiltrates in AIDS patients, a field in which Frank 
Sonnenberg is pursuing parallel and somewhat more applied studies. We plan to take ad- 
vantage of his work and thoughts, using them as a basis for the more theoretical work to 
be done here. In particular, in the coming year we hope to have developed a set of rep- 
resentation conventions that are completely adequate to describe any modeling issues that 
arise in the course of considering an AIDS/pulmonary infiltrate case. By looking at more 
clinical cases, the following complicated issues will be further explored and their implications 
on the proposed representation framework analyzed: contextual representation, representing 
multiple taxonomies, classification along multiple perspectives (in the same taxonomy), and 
the resulting interactions aniong the concepts. Theoretical formalization of the representa- 
tion framework will be attempted, and the resulting expressiveness of the framework will 
be evaluated. We expect that the constellation of issues uncovered here will generalize to a 
much broader set of medical domains. 


3.2.2 Fundamental Methods 


We are focusing on a number of important fundamental techniques for medical reasoning: 
(1) an elegant and flexible formulation of diagnosis, (2) a powerful method of temporal 
reasoning and temporal belief maintenance, and (3) a number of interesting and critical 


approaches to learning in expert systeins. 


*Sacks is a former student, now oa the facnity at Princeton University. 
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Diagnostic Reasoning 


Although the diagnostic algorithms of most medical AI systems have been based on a va- 
riety of ad hoc computational mechanisms, recent research in the formulation of diagnostic 
problems across other domains of AI has identified a more systematic analysis of the bases 
of diagnostic reasoning. Thomas Wu is investigating case structuring, a sophisticated strat- 


egy for solving complex medical diagnostic problems in the broad domain of general internal 
medicine. tia difficult, multiple-disease medical case, a diagnostic system could kelp a physi- 
cian greatly by computing alternative formulations of the case. Case structuring generates 
such alternative formulations. Instead of evoking disease solutions directly (and haphaz- 
ardiyt from the symptoms in the medical case, our approach introduces an intermediate 
clustering step to identify coherent aggregates of symptoms. Each aggregate of symptoms is 
caused by a separate disease and therefore represents a separate differential diagnostic task 
to be solved. A coherent set of tasks that explains the entire case is called a task formula- 
tion: the set of task formulations constitutes the alternative formulations of a medical case. 
Due to tne intermediate clustering step, this method of case structuring is called symptom 


cluster Ni. 


Symptom clustering derives its computational power from two sources. First, it exploits 
mutual constraints that derive from the symptoms in a case. Second, it takes advantage of 
the dual observation that many diseases map onto a few functional derangements and that 
each functional derangement maps onto several co-occurring symptoms. These functional 
derangements ~-called syndromes in clinical practice—offer a powerful source of heuristic 
knowledge for finding the correct task formulation. In the past year, Wu has identified the 
case structuring strategy of medical problem solving, developed algorithms for the symptom 
clustering methodology, and implemented a prototype diagnostic system. 


In the next year, he plans to extend the theoretical framework of case structuring, refine 
the prototype diagnostic system, and evaluate the results. The extensions will cover a num- 
ber of broad issues in medical problem solving, including causal relationships, probabilistic 
assessment and test generation and problem solving strategies. 


Cansality in the current methodology is very shallow, allowing for only pathophysiological 
causality between diseases and the symptoms they can cause. However, diseases may cause 
viner disvases, or they may synergistically enhance or oppose each other, thereby changing 
their svinutomatology. These processes of causal predisposition and causal interaction can 
be incorporated into the symptom clustering methodology by modifying the knowledge base 
and diagnostic algorithm. 


Proisionisit+ notions are missing in the current framework, which is unrealistic since the like- 
head of diseases and causal influences ranges over several orders of inagnitude. ‘Therefore, 
Wu will tatroduce prior probabilities for diseases and causal probabilities for associations 
between ssinptoms and diseases. These probabilities will entail several considerations for 
research: (hey can be changed by contextual information, such as age, sex, race, occupation, 
and medical history; they can help guide the search for plausible task formulations; and they 
can heip determine symptoms that do not need to be explained. 
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Test generation and problem solving strategies are needed to make the diagnostic system 
truly interactive. The current framework receives symptoms passively from the physician 
user; with test generation capabilities, it could actively seek relevant symptom data. Problem 
solving strategies are sequences of tests, representing the procedural knowledge that medical 
experts use to solve cases. Wu plans to incorporate test generation and problem solving 
strategies in a separate system to complement the existing diagnostic system. 


A prototype diagnostic system has been implemented and tested on a small medical knowl- 
edge base. To fully test the strengths and limitations of our approach, however, we plan to 
use a large knowledge base developed by the INTERNIST project. The form of the INTERNIST 
knowledge base is suitable for this diagnostic algorithm. Evaluation of our approach will 
therefore be based upon empirical comparisons between case structuring ana the direct ap- 
proach of the INTERNIST system. 


Temporal Reasoning 


Thomas Russ is continuing the development of the Temporal Control Structure for creating 
exper. systems that use time dependent data. The research has identified schemas of tempo- 
ral reasoning such as the abstraction of patien stairs frum the analysis of examination and 
laboratory results. Support for using 'indsight to evaluate previously made decisions was 
also provided. A paper titled ‘Using Hindsight in Medical Decision Making” was accepted 
by the Symposium on Comp ter Applications in Medical Care, to be held in October 1989 
[278]. It will be a finalist in the student pape. competition sponsored by the Symposium. 
This programming methodology and the system built to support it appears especially ef- 
fective for the implementation of monitoring and tracking systems. Russ, as part of his 
doctoral dissertation, is using the system to implement a system for monitoring the treat- 
ment of patients with diabetic ketoacidosis. We expect this work to be completed by the 
end of 1989. 


A different problem of temporal reasoning arises in the Heart Failure (HF) Program. One of 
the problems with the existing Heart Failure Program is the inability to deal with situations 
in which the order of events or the time between cause and effect is important. Part of 
the reason for this difficulty is that the knowledge base representation of cause and effect 
does not include the properties needed for reasoning abont the time relations. During the 
past year we have been developing a representation for such time relations that will allow 
specifying the essential features without requiring more specincity than is possible. The kinds 
of distinctions needed include the time required to produce an effect, the duration of an effect 
after the cause has been remeved, the type of onset (gradual or acute), and the nature of 
findings (sampled versus symptoms with duration). Since the probability of the effect is 
often a function of both the duration and the severity of the cause, suitable approximations 
of these functions need to be part of the knowledge base. We have been developing a 
representaiion that will capture these relations allowing reasoning with hypotheses that are 
clinically distinct even though they involve the same nodes. We used the representation 
for an example knowiedge base that capt"res the important clinical distinctions in a case 
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that the existing Heart Failure Program is unable to handle. We are currently working on 
strategies for limiting hypotheses to only those that are different in significant ways. 


Learning 


A number of factors have come together to suggest the critical and ever-more-widely rec- 
ognized role of learning in medical AI. First, it is widely recognized that the handcrafting 
of expert systems is a diffcult and time consuming task that could be significantly eased if 
part of model construction could be automated. Second, the increasing availability of large 
bodies of ues: carefully collected real clinical data means that system builders need 
not rely ou human expert judgment as exclusively as we once did. We are investigating the 


applicability of several learning approaches. 


In the context of the Heart Failure Program, we are comparing machine learning approaches, 
such as that of [D3 268', for producing decision trees to the logistic regression approach used 
on a large database of patients presenting with chest pain [267]. We made arrangements 
with Drs. Selker and D'Agostino (267. to use their database of 5773 cases with chest pain 
or shortness of breath in the emergency room. This database has many clinical attributes 
as well as the primary final diagnosis. It was used to develop a predictive instrument using 
logistic regression analysis to determine the probability that a patient has acute cardiac 
ischemia. Because of the care with which the data was collected and the large amount of 
data collected on each patient, this is an ideal database for comparing other technologies to 
that of logistic regression analysis. Our first step is to use the machine learning program 
ID3 to explore the kinds of decision trees that it will generate with the same data. ID3 
is representative of a class of machine learning programs that inductively generate decision 
trees from examples using statistical tests and heuristics to keep the trees small and limited 
to only those categorizations that are statistically justified. Since these technologies have 


been enhanured to bande noisy and missing data, they are capable c. handling a database 


such as this oue. We havea work implementation of ID3 (implemented at our institution 


by a graduate stacent, Jonathan Amsterdam) which has been tested on a number of small 
datasets. We are cues checking the data and designing experiments leading up to 


running ID2 on the whole earning set later in the summer. 


Yeoua Jang plans fo conduct investigations (eventually leading to her Ph.D. thesis) on how 


to learn echnieat cules in a medical reasoning program that incorporates both associational 
atid causal kuowledse. Lhe primary goal ts to create a system that can learn from its 
experpenecs: ca onutsafir. ta be abie to use an indication of whether iis conclusions were 
cerzert (or seenpt ae de eritiene and revise its knowledge and decision methods. 

David Nehessi intend. to coptinue his study of the applicability of formal, algorithmic learn- 


ing methods ta prones of medical knowledge, 
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Intelligent Signal Analysis 


Patil and Scott Greenwald have been investigating the use of AI methods to assist in the 
interpretation of physiologic signals, in particular the design and testing of algorithms that 
automatically analyze two leads of the electrocardiogram. The major emphasis of this work 
has been in the continued development of CALVIN, an expert system that exploits con- 
textual information in the ECG. CALVIN has been developed to enhance the detection of 
normal beats and isolated ventricular premature beats in the presence of severe electrode 
motion noise and QRS-like artifact. In addition, we also modified ARISTOTLE, a traditional 
arrhythmia detection algorithm. Greenwald has developed and evaluated a noise detection 
strategy using the ratio of the number of peaks to beats detected within a three second 
window. A ratio greater than a threshold signifies the presence of severe noise. 


In the last year, the results of CALVIN’s detection performance were presented at the Com- 
puters in Cardiology Conference (September 1988) [138]. In addition, the work was discussed 
at the Association for the Advancement of Medical Instrumentation Conference (May 1989). 
Finally, we plan to present a working real time demonstration of the combined ARISTO- 
TLE/CALVIN arrhythmia analysis system on a MAC I at the 1989 Computers in Cardiology 
Conference (September 1989.) 


The future direction of this work is focused on improving the detection of isolated atrial 
premature beats (5), isolated premature ventricular beats (V), and normal beats (N) in the 
presence of severe noise. The following projects will help bring that goal closer. 


ECG Database Development: A database containing sinus arrhythmia and atrial ectopic 
activity needs to be created for developing and testing CALVIN’s performance on atrial 
arrhythmias. A database consisting of twenty 1/2 hour sections (10 for detector development 
and 10 for detector testing) will be collected for each of the following arrhythmia classes: 


l. isolated atrial premature beats; 
2. atrial couplets, triplets, runs of atrial tachycardia, and paroxysmal atrial fibrillation; 
3. mixed isolated atrial premature beats and isolated ventricular premature beats; 


4. mixed atrial and ventricular couplets, triplets, and tachycardia; 


ot 


. real-world noise (primarily electrode motion artifact); and 


6. sinus arrhythmia (normals will need to be collected from Beth Israel Hospital Holter 
Laboratory ). 


Improving CALVIN’s Performance: The current CALVIN system works well in cor- 
recting ARISTOTLE’s errors in classifying normal beats and isolated premature ventricular 
beats in normal sinus rhythii. However, it continues to make mistakes matching hypothet- 
ical sequences of beats to the raw data in two cases: 1) in regions where the heart rate is 
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moderately increasing or decreasing (say over 5 or so beats), and 2) in regions where the 
heart rate is constant but where its estimate of the heart rate is inaccurate. 


One possible solution to this problem is to improve our estimate of heart rate (using beats 
classified with high confidence within a wide context) and to “stretch” the hypothetical 
sequences of beats in a nonlinear fashion to account for heart rate variations. The following 
two projects would heip us meet this goal. 


One important question to answer is how interbeat intervals vary as a function of heart 
rate. For example, normal-to-ventricular beat coupling intervals (NV intervals) tend to be 
constant to a first approximation as heart rate varies. We need to collect statistics on the 
relation of NS, SN, NV, and VN intervals as a function of heart rate in order to determine 
how to “stretch” and match hypothetical sequences of beats for a particular heart rate. 


During periods of noisy ECG, the noise level may drop momentarily. During this brief 
period one or two heart beats may become quite apparent. Physicians frequently use these 
“landmark” beats as fidicual points to re-adjust their estimates of heart rate and their 
expectations of beat locations. 


We need to develop a method to find these landmark beats, and to use them in improving 
our heart rate estimate and in selecting the correct hypothetical sequences of beats. One 
possible confidence measure to use to find these landmark events is the correlation coefficient 
of the data with the best matched beat template. If the correlation were above a conservative 
threshold, then our confidence in the beat’s identity would be high enough to update our 
heart rate estimator. 


3.2.3 Probabilities and AI 


As part of our strong ongoing interest in the integration of probabilistic and artificial in- 
telligence methods, we are pursuing—and plan to continue to pursue—a number of related 
issues. Fundamental to all this work is the question: what is the most appropriate form in 
which to capture and represent probabilistic information in such a way that it is useful to 
physicians? 


Computations in Probabilistic Net works 


Research over the past few years on the evaluation of Bayesian probability networks has led to 
the development of new algorithms for handling multiple patus between nodes. In particular, 
the approach of Lauritzen and Spiegelhalter [196], while not circumventing the inherent 
exponential nature of the problem, seems to be significantly faster than other published 
approaches. This approach should be fast enough to handle networks with tens of nodes 
and several multiple paths. We are implementing the algorithm to determine its practical 
limitations and to apply it to problems of about that order of complexity. The speed of this 
algorithm should also make it possible to test the heuristic approach we are using on larger 
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networks. The empirical tests we conducted over the past two years on the heart failure 
program by entering actual cases and critiquing the differential diagnoses have convinced 
us that our current heuristic approach to the evaluation of the network (which has about 
150 intermediate nodes, about 300 possible terminal nodes, many multiple paths, and even 
forward loops) produces good hypotheses [216]. By inspection we have not found better 
hypotheses than the ones produced by the heuristic method, but we have not had an exact 
method for determining the best hypotheses. Our implementation of the Lauritzen and 
Spiegelhalter algorithm will allow us to test the performance of our heuristic algorithın on 
somewhat simplified models. Investigation of the Lauritzen and Spiegelhalter algorithm has 
taken place and we are beginning the implementation, which should be completed shortly. 


Probabilistic Reasoning for Genetic Pedigrees 


We plan to continue a collaborative effort principally undertaken during the past year with 
Susan Pauker, responsible for the clinical genetics counseling program of the Harvard Com- 
munity Health Plan, and her genetic counseling colleagues. Pauker has a longstanding inter- 
est in the application of probabilistic reasoning methods to her counseling practice [255][256). 
Nomi Harris, as part of her Master’s thesis [155], has completed a program that employs 
the techniques of Bayes networks [257] to solve the probabilities of a consultant’s risk of 
abnormality for arbitrary pedigrees, including cases of inbreeding. The program handles 
diseases that may be dominant, recessive or sex-linked, and has facilities for dealing with 
incomplete penetrance, mutation, time-varying likelihood of expression, etc. A paper de- 
scribing this work has been accepted for the upcoming SCAMC meeting and is also a finalist 
in the student paper competition [154]. 


We now ported the computational core of that program to personal computers (the Macintosh 
and a fully equipped 386-based PC), and are investigating the design of appropriate user 
interfaces. In addition, we pian to put this program into the hands of practicing genetic 
counselors within the next few months asking them to perform retrospective evaluations 
of some of their clinical cases and, based on feedback from their experience, outline what 
additional capabilities are necessary to support the analyses performed by these counselors. 
We also plan to explore the adoption of the Spiegelhalter and Lauritzen network algorithm 
to speed up the program on complex, interbred pedigrees. 


Extracting Probabilistic Information from Partial Databases 


We are also using the Selker and D’Agostino database (described above under “Learning” ) 
to investigate the probabilities in the Heart Failure Program. Since many clinical attributes 
of the patients were collected, it is possible to identify subsets of the database that have 
properties corresponding to some of the intermediate nodes in the model and determine the 
probability and confidence intervals for some of the findings given those states. It is not 
possible to do this for all of the nodes in the model because not all of the data was collected, 
some of the findings in the model do not exactly correspond to the data collected in the 
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database, and no secondary diagnoses were recorded in the database. Even so, the data 
gives us an opportunity to compare the actual occurrence of some findings in the model to 
the probabilities given by our cardiology experts from their clinical experience and knowledge 
of the literature. We are currently determining the ccrrespondence between the model nodes 
and the database attributes and determining which relations can be tested. 


Decision Tree Critiquer 


In the past year, we began a study of the efficacy of the decision tree critiquer (DTC) 
in detecting structural errors in decision trees. Recall that DTC was designed after first 
observing the kinds of errors that arose in trees built by trainees on the clinical decision 
making consultations service. In this ongoing study, initial trees presented by trainees and 
students are captured and translated from the PC environment to the symbolic environment 
by a neutral research assistant (a medical student), The critiquing program then lists the 
potential structural errors identified by its knowledge base. Because of the annual turnover 
of fellows and more frequent turnover of students and visitors to the service, we anticipated 
that the frequency of errors identified would be of the same order of magnitude as was found 
two years ago when we initially developed our error catalog and began to code DTC. 


To our surprise, we have been identifying far fewer structural errors, suggesting that the 
educational process has been passed from trainee to trainee. At this point most of the 
“problems” identified by DTC are not true errors but rather represent instances in which 
relationships that might have been represented structurally have been incorporated into 
“binding expressions” within the decision trees. Such expressions are essentially microcoded 
programs that the current implementation of DTC cannot parse into relations among mean- 
ing concepts. We feel that this represents more than just a limitation in DTC. When decision 
trees are used by the clinical service, they undergo an extensive debugging process, which 
in fact almost uniformly identifies problems and necessary refinements. Those problems, 
as identified by human experts (faculty), almost always lie within expressions within the 
“bindings,” probabilities, and utilities of a decision tree. This concordance of error locations 
suggests that our current classic formalism for problem representation, which hides certain 
relationships within algebraic expressions, is inadequate. 


One ready hypothesis might be that decision trees are a less adequate representation than 
influence diagrams of complex decision problems. In fact, the two formalisms appear to be 
complementary representations, with each making explicit certain relations that are “hidden” 
in the other -in the tables of the influence diagram and in the binding structure of the 


decision tree. 


In the next year, we plan to allow DTC to process the decision trees created by a new crop 
of fellows (five new to our division, four new to decision analysis). We also plan to process 
a random selection of trees from the published literature, omitting trees created by our 
division or by former trainees. Unfortunately, the current DTC implementation is incapable 
of examining the details of the model contained within mathematical expressions (utilities, 
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probabilities, bindings) and only critiques the structure of the decision tree. We plan to 
expand this system, developing a syntax and rules for examining the content of expressions 
and the structure of Markov components. The examination of the content of expressions first 
requires that a consistent formalism be developed for representing the semantics contained 
within those expressions, creating appropriate links. Eventually, one would like to be able 
to link those expressions to a domain-specific knowledge base, but we will first explore the 
nature of relations among variables without reference to their medical content. 


We already began an implementation of the existing DTC in the same microcomputer envi- 
ronment as DecisionMaker, using the Goldworks systems. We hope to complete that imple- 
mentation in the next six months and begin to examine the feasibility of its use concurrent 
with tree development. 


Microcomputer Implementations 


Substantial portions of the decision tree critiquing system have been ported to the IBM 
environment in the Goldworks system. That partial implementation is functioning, and we 
are exploring mechanisms for allowing it to interact conveniently with the evolving Decision- 
Maker environment. 


DecisionMaker itself has been expanded to include more transparent representation of sub- 
trees, and we have been exploring alternate visual representations of the implicit information 
contained within bindings. We also improved the scripting mechanism and completed the 
implementation of Monte Carlo simulations within our microcomputer environment. 


DecisionMaker is now a fairly stable product in a Pascal environment on the IBM compatible 
computer family. We plan to explore the feasibility of a direct port to the Macintosh world 
using the Borland Pascal environment. We also plan to use the new Borland object-oriented 
programming modules within their Pascal system to implement a portion of the DTC (now 
on Symbolics and Goldworks) system, and provide concurrent advice on tree construction 
and the interpretation of the results of sensitivity analyses. 


Applications 


We have been developing an extensive model comparing coronary bypass surgery, percu- 
taneous transluminal angioplasty, and conservative therapy in patients with angina. That 
complex Markov model required expansion of the capabilities of our modeling environment 
which has now been completed. The results of that cost-effectiveness analysis are now 
under review. We also developed models of screening for sickle cell disease, the selective 
use of cytomegalovirus immune globulin in renal transplant recipients, the expanded use of 
thrombolytic therapy in patients with acute myocardial ischemia, and the determination of 
occupational risk of HIV infection. 
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4.1 Introduction 


The 1988-89 year marks the end of the Real Time Systems Group’s 15-year history. RTS 
merged with the groups of Dally, Agarwal, and Knight to form the new Computer Architec- 
ture Group, aimed at integrating the ideas, research, and resources of a currently fragmented 


community. 


The past year has seen substantial contraction of the RTS Group, with the departures of 
Robert Zak and Milan Minsky, along with the promotion of John Pezaris from staff to stu- 
dent. Previously reported research involving the development of the set-associative DRAM 
was brought to an orderly finish. The L Project continued as a focus of the group’s efforts, 
and the seeds were planted for a new effort to develop a high performance communication 
and packaging substrate for chip-level digital modules. Each of these projects is the subject 


of a following section. 


In Alewife, research has focused on large scale computer architecture and parallel processing 
software. The design of ALE WIFE, a scalable cache-coherent multiprocessor, is the vehicle 
for much of our research and is a collaborative effort with Tom Knight of the Artificial 
Intelligence Laboratory. The ALEWIFE multiprocessor will support multiple models of 
computation including shared-memory, message passing and the data parallel model. 


4.2 The L Architecture 


Continuing research on the L architecture (by Ayers, Minsky, Jenez, Kommrusch, Puckett, 
Nguyen, Pezaris, Ward, and others) led to considerable progress despite relative disinterest 
on the part of potential funding agencies. Recent efforts have focused on the interface aspect 
of L, promoting its viability as a hardware-independent virtual machine semantics rather 
than on the specifics of any single hardware implementation. 


4.2.1 Macintosh L Implementation 


A second implementation of L was made operational on the 68020-based Macintosh II, pro- 
viding (1) a very different hardware platform than our previous re-microcoded Explorer, (2) 
an evatnuation basis for L implementations on conventional processors, and (3) an environ- 
ment for the development of transparent trap-and-translate software. 


Rather than directly executing L instructions, the Macintosh implementation of L traps 
when L code is encountered and transparently translates it to native 68020 instructions. 
Phe block of native code is cached in local storage, avoiding retranslation of active program 


modules, 


The 68020 imiplenentation uses local RAM as a cache for active chunks, following the scheme 
developed by Ayers in 1987. This local memory serves both as a cache and as a site for 
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the highest volatility level of the homogeneous storage model of L. In our current (single- 
processor) Mac implementation, references to non-resiuent chunks (missing chunk faults) re- 
sult in access to a disk-resident external chunk space. External chunk references are patched 
(by the fault Landler) to refer to local names as each chunk is imported, avoiding the time 
and cost overheads associated with conventional memory management hardware. The cur- 
rent scheine amounts to chunk-based virtual memory on the Macintosh, although the code 
anticipates sharing of the external chunk space by several L processors each having local 
memory. 


4.2.2 Binary Compatibility among Inhomogeneous Machines 


Our second L impiementation sheds light on a number of interesting interface issues. Our 
goal has been to establish the illusion of absolute binary compatibility among dissimilar 
machines, without imposing the overhead of interpretive mechanism. We have been able to 
demonstrate such compatibility using simple (“toy”) programs in recent months. 


Our demonstration involves starting a small L program running on one system (e.g., the 
Explorer); asynchronously interrupting it at some arbitrary point in its execution; copying 
the entire network of chunks representing the computation (including data, program, and 
program state) to a dissimilar machine (e.g., the Macintosh); and continuing execution 
without losing information or consistency. Moreover, the program runs at full native-code 
speeds on both machines. 


4.2.3 Types as Approximations 


The L base language, roughly a variant of SCHEME with compiler- rather than interpreter- 
based semantics, allows most references to type information to be resolved at compile time 
rather than incurring runtime overhead. The L compiler attempts to provide a general- 
ity/cost/performance continuum between the extremes of (say) Lisp and C by providing for 
runtime types but attaching a nonzero cost to their (optional) use, rewarding the author of 
a C-like L program by higher performance than that of an equivalent SCHEME-like one 


In addition to encouraging explicit type declaration by the programmer, this compiler flex- 
ibility places a premium on mechanisms for automatic type inference. Work in this area 
by Nguyen and Ward has led to an interesting alternative to the mechanisms of Milner and 
others; our type system promises, in addition to new inference possibilities, improved type 
support for such language features as polymorphic functions, subtypes, and side effects. 


Our type system views types as compile-time approximations of runtime values. An expres- 
sion or variable may have a range of types, corresponding to varying amounts of partial 
information which can be inferred. The most informative type of a value is the value itself. 
while the least is the type any which applies to all values. The type of a procedural object 
is itself a procedure which maps types of the object’s inputs to types of its outputs, whence 
(again) an L procedure is its own most accurate type. 
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The proposed inference algorithm evaluates type expressions whenever possible during com- 
pilation. As a special case, this scheme causes complete compile-time evaluation of simple 
functional subexpressions. However, this evaluation is complicated by (1) the possibility 
of non-termination (since the type language is Turing universal), and (2) the presence of 
side-effects (since the evaluation scheme assumes a simple applicative semantics). Under 
various circumstances, including timeouts and assignments, the inference scheme will re- 
vert to a weaker approximation as the type of an expression. Thus if z is the target of 
two assignments of types 5 and 2.718, or alternatively of the weaker types integer and real, 
the algorithm will infer a type for z such as any, number, or (union 5 2.718). The use of 
approximations allows us to guarantee (1) termination of the inference algorithm, and (2) 
functionality in the type domain. It sacrifices, of course, any hope of completeness claims 
for our system: our compiler will fail to discover certain inferrable types if it is forced to 
discard information in its approximations. 


4.2.4 Cartesian Network Relative Addressing 


Unlike conventional machine architectures, the programming model for L imposes no bound 
on the size of its addressable universe. To exploit this flexibility, work by Morrison has 
explored memory systems based on an addressing scheme called Cartesian Network-Relative 
Addressing (CNRA), 


The CNRA architecture attempts to maximize scalability by using a novel addressing tech- 
nique that provides some of the advantages of both global shared memory models and “local” 
non-shared memory models. This addressing technique assumes that the multiprocessor is 
built with a direct intercommunication network. Addresses in the CNRA system are com- 
posed of a “routing” component and a “memory location” component. The routing com- 
ponent indicates a path through the interconnection network. (The origin of the path is 
the node on which the address resides.) The memory location component is the memory 
location to be addressed on the node indicated by the routing component. 


This addressing system offers the unlimited address space provided by local non-shared 
memory models, but allows easy sharing of data structures in the style permitted by global 
shared memory machines. The thesis discusses how a practical CNRA system might be built. 
There are discussions on how the system software might manage the “relative pointers” in 
a clean, transparent way; solutions to the problem of testing pointer equality; protocols and 
algorithins for migrating objects to maximize communication locality; garbage collection 
techniques; and other aspects of the CNRA system design. It is clear that the CNRA 
system is scalable (in terms of demonstrating a way to connect many processors). However, 
whether or net the system will work well will depend on the communication behav our of 
large multiprocessor programs. Since this is not yet well understood, simulation will be 
required to test the viability of the CNRA architecture. 
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4.3 NuMesh 


This section reports very preliminary thoughts on a proposed new research effort involving 
Ward, Dally, Agarwal, Knight, and others. The idea is currently in its fetal stage. 


4.3.1 Introduction 


Over the past two decades, the backplane bus has dominated computer architectures as the 
mechanism for intermodule communications. The reasons for this dominance are simple 
and remain compelling: a well-designed bus provides a simple, extensible communications 
substrate which allows modules performing a variety of computational tasks (and produced 
by a variety of manufacturers) to be assembled into coherent systems. It induces a Tinkertoy- 
set modularity at the system configuration level, allowing systein designers to build systems 
without redesigning every component. 


The technical limitations of buses are well known, however. Since they serialize all system- 
level communications, they constitute a communication bottleneck, and one whose capacity 
remains roughly constant as the system size is increased. Moreover, the timing of a bus 
is constrained by a fundamental space/time tradeoff: the time taken by each transaction 
must accommodate the physical length of the bus, ensuring a relatively low communication 
bandwidth on all but trivially short buses. Other overhead, such as the need to arbitrate 
the shared communication resource among competing requests, further reduce the viability 
of the bus as a basis for communication in high performance systems. 


The following paragraphs propose the development of a communications substrate which af- 
fords Tinkertoy-set modularity and very high performance communication in a constrained 
but interesting range of applications. The approach involves standardizing the mechanical, 
electrical, and logical interconnect among modules arranged in a (partially populated) 3D 
mesh whose lowest level communications follow pre-compiled systolic patterns. The attrac- 
tiveness of the scheme derives from the separation of its communications and processing 
components, and the standardization of the interface between them. This decoupling of 
computation from the communication substrate allows the continued exploitation of mass- 
produced processing elements, e.g., contemporary signal processing chips. 


The goal is a set of hardware modules and support software which allows high performance, 
special purpose multiprocessors to be configured for particular applications in a matter of 
hours. Applicability of processors so configured will be restricted to relatively static, limited- 
connectivity algorithms such as those found in signal processing, graphics, or other real time 
applications; this proposal does not address the prohlem of general purpose multiprocess- 
ing (for which alternative proposals abound). It has the potential, however, of delivering 
economical supercomputer power in an interesting but limited set of application domains. 


4.3.2 Modules 


Each component of our system is a computational module, perhaps occupying a two-inch 
cube (or conceivably much smaller). Bach module contains common circuitry devoted to low 
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level communications and control functions, as well as one or more off-the-shelf chips which 
perform computation. Initially, we envision modules containing a modern 60-MFlop DSP 
and (say) a modicum of additional memory; eventually, a repertoire of compatible modules 
offering varied functionality (sharing the communications circuitry) might evolve. 


The common circuitry (likely one or several ASICs) includes communication/control proces- 
sor and a number of receiver/transmitter pairs which drive lines to neighboring modules. The 
communication perts (and accompanying mechanics, connector technology, etc.) allow the 
modules to be configured into a limited-connectivity network; our preference is a 6-neighbor 
3D mesh, although other topologies are of course possible. Operation of the entire array is 
synchronized to a fast clock, whose period is the minimum time necessary to transfer a data 
word (32 bits?) between adjacent nodes. The limited distances (an inch) and fixed mechan- 
ics should allow this time to be quite fast, perhaps 10 nanoseconds or less. A variety of other 
functions, such as trimming clock skew, intermodule control, and processor/communications 
synchronization, are also performed by the comm:n «rcuitry. 


Initial prototypes will undoubtedly use pre-packaged off-the-shelf chips for the function- 
specific portion of each module. Assuming wild success of the NuMesh and bandwagon 
momentum among suppliers of high performance silicon, however, one might imagine each 
manufacturer packaging and bonding chips directly into a pre-constructed NuMesh package. 


4.3.3 Communications 


Each communications processor consists of a simple FSM which follows a periodic sequence 
of I/O transactions. It has a small number of registers, each of which is addressable by 
the DSP as an external memory location. The transition table of the FSM (in RAM) can 
be progranimed to read inputs from various neighbors into registers and send outputs from 
various registers to other neighbors on each clock cycle. In the most ambitious configuration, 
any port may be read or written (or perhaps both, since we presume the lines to be unidirec- 
tional) on each clock cycle. Implementation considerations may dictate further restrictions; 
e.g., allowing only one output datum (perhaps to several destinations) and one input datum 
per clock cycle. The latter restriction is suggested by an implementation involving internal 
input and output buses interconnecting separate receiver/transmitter chips for each port. A 
inyriad of other compromises are possible, depending on resources at the technological level. 


The general idea is that each module’s communications FSM be programmed to follow a 
periodic pattern of interactions with neighbors. Although the interactions may vary among 
processors, the periods will be identical. If module A transfers a word to its right-hand 
neighbor B on clock 37 of each period, then A’s FSM will be programmed to drive its lines 
to Bou that clock. while B will be programmed to load in data from A. By appropriate design 
of transition tables, arbitrary systolic communication patterns may be implemented among 
proressors, In some cases, words loaded by a module are destined to be read subsequently by 
that module's DSP; in other cases, they are routed (typically on the next clock) to another 
neighbor without DSP intervention or even awareness. 


Certain algorithms may venefit from flow control and other synchronization measures in their 
underlving communications. These might be superimposed on the primitive (branch-free) 
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communication mechanism by software convention, allowing certain data words to contain 
control information. It is possible that analysis of potential application code will suggest the 
addition of hardware support for such control purposes, 


Additional protocol provisions allow the communications FSMs, and perhaps the functional 
circuitry attached to them, to be programmed. This process is viewed primarily as a boot- 
strapping operation, and may be relatively slow; however, the potential for time-varying 
communication patterns may eventually be explored. 


4.3.4 Software 


The rapid prototyping of ad hoc multiprocessors depends on automation of various aspects 
of the design task, including (1) design of the network topology, (2) allocation of compu 
tational tasks to processors, (3) specification of details regarding timing and direction of 
communications for each module/clock pair, and (4) programming of the DSP. While steps 
(1) and (2) are the most challenging, they are amenable to partial solutions (e.g., involving 
interaction and direction from the designer) and benefit enormously from the restriction to 
static algorithms with time- and space-bounded components. 


Code generation involves an accurate, detailed model of DSP timing—perhaps including 
cache operation. However, hardware provisions (e.g., bit/register R/W sync bits, like I- 
structures) might provide some timing latitude in DSP-FSM synchronization. Placement 
and routing aspects of system design---mapping a graph of time-bounded computations to 
a grid of processors—can probably benefit from progress in adjacent domains of algorithm 
research. 


We emphasize that our choice of a 3D interconnect topology stems not from the 3D nature 
of intended applications but from the 3D nature of our physical universe. We anticipate that 
the NuMesh interconnect will perform well in any computation characterized by a static 
sparsely-connected communications graph amenable to efficient embedding in 3-space. 


4.3.5 LegoFlops as a Research Goal 


A major attraction of this scheme, and variants, is its promise of mind-boggling performance 
in a limited but conspicuous class of applications, Unlike more ambitious approaches to mul- 
tiprocessor architecture, commitment of engineering talent, money, and corollary resources 
are almost certain to produce splashy results (i.e., huge MFlops and MFlops/ parameters) and 
very impressive demos (speech, graphics, etc.). By riding the coattails of highly-engineereu 
DSPs, we leverage the real muscle of the remaining domestic semiconductor industry; indeed, 
it is difficult to imagine TI and Motorola not vying strongly for an opportunity to participate 
(and to impiant their respective DSP chips). The technical risks are low: the architectural 
schema can certainly be made to work; the software, while challenging at its most ambitious 
extreme, admits many clearly practical compromises. Perhaps the biggest challenges invoive 
the basic physics of the systent: mechanics, connectors, cooling, and power distribution. Here 
there is unlimited opportunity for cleverness, optimization, and state-of-the-art engineering. 
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Like most system-building proposals, this one is resource intensive: to achieve its potential, 
it will require staff engineers, VLSI fab, proselytizing and support of a user community, and 
the better part of a decade of LCS commitment. However, it seems to be an area where suffi- 
cient financial commitment virtually guarantees interesting and conspicuous results—results 
which, unlike the incestuous tools of the computer science community, enable breakthroughs 
in real (non-CS) applications. The ultimate attraction of this commitment to LCS and 
MIT may be the distinction it engenders in other areas: a period of MIT supremacy in 
speech recognition, imaging, dynamic graphics, control robotics, and a host cf other client 
disciplines. 


4.4 Alewife 


The Alewife multiprocessor consists of a set of processing nodes interconnected via a low 
latency network. A high speed processor, a large coherent cache, memory and a cache- 
memory-network controller constitute each processing node. The current version of the 
network uses a topology of the Omega [198] (or Banyan) class of networks and is circuit- 
switched to allow low latency communications. Our research also addresses scalable fat-tree 
[203] and low dimension direct networks [91] that display locality and can provide quick 
access to neighboring memory modules without requiring a full network traversal. The 
processor, called APRIL [207], permits rapid context switching through the use of multiple 
register files and includes support for efficient synchronization and handling of Futures. 


In the software arena, the parallel Mul-T system developed at the Laboratory for Computer 
Science in the Parallel Processing Group is being adapted for our use. The Mul-T system 
includes a production quality compiler for parallel applications. We have developed T- 
Mul-T, an address tracing system that produces traces of parallel applications written in 
Mul-T. T-Mul-T can also be interfaced to a cache-memory system simulator, which in turn 
interfaces to an interconnection network simulator. The coupling of trace generation and the 
memory system simulator allows rapid simulation of various system configurations without 
introducing time distortions in results; it also has speed advantages over a software processor 
simulator, and does not incur trace storage overhead. However, a processor simulator for our 
APRIL processor is also necessary, because APRIL is sufficiently different from the processor 
we are currently tracing. Such a simulator is currently being developed to replace the trace 
generation backend of our simulation system. 


The principal areas of our research in the past year included: 


lL. Multiprocessor data collection tools and techniques; 


2. The design of directory systems for large scale cache-coherent multiprocessors; 


T 
3. Low latency interconnection network design and analysis; 


+. Investigation of new synchronization techniques; 
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5. Design of VLSI processors for parallel computers; 
6. Parallel processing software and applications; 
7. Exploiting locality to enable scaling of large scale multiprocessors; and 


8. Performance modeling and evaluation. 


The following is a brief description of each area. 


4.4.1 Multiprocessor Data Collection 


Continuing our efforts in parallel trace data collection, we now have a tracer called T-Mul- 
T that generates traces for parallel symbolic applications and is written under Mul-T, a 
parallel Lisp system described later. The first implementation was done for the Encore 
Multimax. The Mul-T kernel was modified by David Kranz to simulate an arbitrary number 
of virtual processors, running on only a single processor. The simulation switches to a 
different processor after each memory reference emitting a packet for each reference. The 
memory allocator was modified to make each processor allocate storage in its own area oi 
memory so that we could study the effects of locality. The global memory allocation used 
by Mul-T would not make sense in a large scale multiprocessor. The context switching and 
memory packet emission is controlled by having the compiler insert code into the instruction 
stream. The resulting simulation is very fast, only 20 times slower than Mul-T on the Encore 
multiprocessor itself, neglecting I/O time for the memory packets if they are to be written 
out to disk. 


T-Mul-T does two things for us: 


1. It allows us to run areal Mul-T program on an arbitrary number of processors instead of 
just 16 (the number our Encore machine has). Because the simulation switches virtual 
processors after every ınemory reference, it gives a faithful simulation of a possible real 
execution of the Mul-T program, but it isolates scheduling and parallelism issues from 
the bus contention problems that exist in the Encore machine. 


2. It gives us parallel traces for real programs that can be used in cache and memory 
network simulations to help understand the locality issues. 


A port of T-Mul-T to the DEC Microvax and the MIPS R2000-based DECstation 3100 
is also partially complete. We have gathered several large traces of symbolic applications 
written in Mul-T including MODSIM-—-a functional simulator, BOYER—a theorem prover. 
and several other applications. 


In a joint effort with the IBM T. J. Watson Research Center, Mathews Cherian derived large 
parallel FORTRAN traces using a “postmortem scheduling method” that can incorporat 
multiple synchronization models. In this technique, a multiprocessor trace ie created fro: 
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a memory reference trace of the uniprocessor execution of a parallel application. Using the 
record of synchronization events contained in the uniprocessor execution trace, a postproces- 
sor can schedule tasks from the uniprocessor execution trace into a multiprocessor trace in 
which the synchronization sections are simulated assuming some model of synchronization. 
The scheduler simulates processors generating the requests in a round-robin fashion. Parallel 
FORTRAN traces of several popular benchmarks include SIMPLE, WEATHER, and FFT. 
We are using these traces in a wide variety of studies ourselves, and we plan to distribute our 
trace data to the research community and to industry. Efforts to trace these applications 
with modified algorithms to enhance program locality are also in progress. 


We continued tracing parallel C applications under the MACH operating system using the 
Vax T bit technique [279}/134]. Kiyoshi Kurihara has this tracer running on a DEC Microvax 
3200 and is modifying it to use MACH threads instead of UNIX processes to enable faster 
tracing. 


A slight modification to our parallel T-Mul-T tracer has also enabled the emulation of large 
scale multiprocessors, where the underlying processor on the machine which simulator runs 
on, substitutes for the processor in the multiprocessor being emulated. We have simulators 
for cache/directory systems and interconnection networks, which can be plugged back to 
back to provide the system backend to the processor emulator. The FORTRAN postmortem 


scheduler can also be used as the backend to the multiprocessor emulator. 


4.4.2 Large Scale Cache Coherence 


David Chaiken and Mathews Cherian worked on directory schemes and synchronization in 
large scale multiprocessors; earlier studies of directory schemes were limited to small scale 
systems of 4 to 16 processors. We investigated the scalability of limited directory schemes [10] 
for cache coherence in the large scale. In a limited directory scheme, the number of pointers 
in the memory for each block can be less than the number of caches in the system. Such a 
scheme works because of temporal locality property of processors referencing a given memory 


block. 


Mathews Cherian wrote a cache and directory simulator suitable for simulating large scale 
systems that provide several statistics on cache coherence schemes such as the effects of cache 
size, block size, number of directory pointers, and the number of processors. The simulator 
also provides traffic rates of schemes that do not cache shared variables, and can distinguish 
between shared and private traffic, synchronization and non-synchronization traffic. On 
the slate for the future is obtaining statistics on a pointer-chaining directory based cache 
coherence scheme '76!. 


David Chaiken extended the work on directory schemes and wrote a simulator to reflect more 
details of the cache coherency protocol. This program can simulate a fully-acknowledged 
protocol that guarantees sequential consistency, correctly interacting with a rapid-context- 
switching processor. The simulator is being extended to handle a weak-coherence protocol 
that fields processor-issued fence instructions and outstanding memory operations, and to 
handle access to full/empty bit synchronization requests. 
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This program is intended to be part of a processor/cache/memory/network simulator that 
can be used to perform a detailed analysis of the architecture that we are proposing. 


Results from simulations of parallel FORTRAN applications show that these limited di- 
rectory schemes do scale for some applications. But for other applications, widely shared 
synchronization objects and sharing of words within cache blocks reduce their performance 
to almost that of a scheme that does not cache shared objects. Consequently, our current 
focus is on methods for restructuring parallel programs to exploit caches. Our results for 
applications written in Mul-T showed relatively much better performance due to the lack of 
widespread sharing prevalent in the FORTRAN applications. 


A major observation was that synchronization references are another impediment to scala- 
bility. Because a large number of processors simultaneously access synchronization variables, 
excess traffic to a single hot-spot location results. Large scale, cache-coherent multiproces- 
sors suffer significant amounts of invalidation traffic due to such synchronization reference 
patterns, Large multiprocessors that do not cache synchronization variables are often more 
severely impacted. If this synchronization traffic is not reduced or managed adequately, 
syuchronization references can cause severe congestion in the network. We are investigating 
new scalable svnchronization methods that do not incur excessive hardware cost [6]. A later 
section will describe our work that addresses this issue. 


As a sampling of our results on the performance of directory schemes for cache coherence. 
Figure 4.4.2 shows an invalidation histogram for a 64-processor simulation of Diry N B driven 
by a trace from the SIMPLE application. Diry N B corresponds to a directory scheme that 
uses no broadcasts and has N pointers. The graph shows the histogram of the number 
of invalidations required during a write to a previously clean block. The graph shows the 
percentage of writes which resulted in invalidations to up to 12 caches. Writes resulting in 
invalidations of greater numbers of caches were proportionately insignificant. In over 95% 
of the times that an invalidation occurred, a block had to be invalidated from no more 
than three caches. This small number of invalidations compared to the possible maximum 
(1.e., 64) implies that for the common case the directory need have just a few pointers to 
encode the locations of the shared blocks. Invalidation histograms for FFT and WEATHER 
had a corresponding figure of over 99%. Synchronization references accounted for all the 
invalidations involving roughly 10 or more caches—a definite problem. 


On a further investigation of the scalability of cache coherence schemes, we observed that 
with large block sizes, concurrent accesses of a block by several processors caused excess 
invalidation traffic. For exaniple, halving the block size to 8 bytes from 16 almost halved the 
network request rate. Our current efforts are aimed at compiler techniques to make caches 
viable by reducing this interference effect. 


David Chaiken is working on semantic models for shared memory. Ongoing research includes 
proving that our directory scheme implementation does conform to our definition of cache 
coherence [75]. The design of a cache-directory and network communications controller, to be 
used in a large scale multiprocessor, is in progress. The chief issues being addressed are: the 
programmability and the implementation efficiency of various shared-memory programmınr 
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paradigms, such as strong serialization versus weak ordering, supporting full-empty bits in 
the cache/memory controller, and tradeoffs in controller design to support context switching, 
such as re-issuing instructions versus pipeline freezing. 


4.4.3 Interconnection Networks 


We analyzed interconnection network architectures that can best exploit the lower average 
traffic intensity of cache-coherent systems. Analytical evaluations with packet-switched and 
circuit-switched networks, assuming similar speeds for the switch nodes, show that circuit- 
switching can be superior to packet-switching in the medium scale (256-1000 processors). 
Our simulations with the parallel FORTRAN traces also indicate that directories yield better 
processor utilization than a scheme that does not cache shared data. The relative advantage 
of caching can be further enhanced by clever program restructuring to exploit fast access to 
cached data. 


Figure 4.4.3 shows the processor utilization for packet- and circuit-switched networks for a 
limited directory scheme with four pointers and a scheme that does not cache shared data 
for the FFT application. We see that circuit switching networks yield better performance 
up to about a thousand processors. Directory schemes are also superior to the non-caching 
schemes, although not by much. As mentioned earlier, we are investigating methods to 
improve benefits of large coherent caches. 
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Figure 4.2: Processor utilization for packet and circuit switched networks for a limited 
directory scheme with four pointers and a scheme that does not cache shared data. The 


application is FFT. 


Gino Maa has written a fully-configurable circuit-switched interconnection network simulator 
which is being used to model the performance of proposed machine architectures and to 
validate the effectiveness of our analytical performance models. It allows us to evaluate 
the impact of alternative processor architectures, cache and coherence protocol designs, 
and network topologies. A packet-switched version, being written by Sue Lee and Gino 
Maa will be operational presently, so that tradeoffs between circuit- and packet-switching 
can be examined via simulation under various system configurations. The simulator works 
with both live and static frontends: with live sources such as an instruction-set interpreter. 
block/resume handshaking signals are generated at the interface to exert back pressure on 
the execution and scheduling behavior of the processors. With static trace inputs, trace 
skew statistics is collected to provide a qualitative confidence measure on the integrity of the 
simulation data. In either mode, the network input traffic may be “pre-filtered” via a memory 
cache simulator. With the simulators, we are already collecting useful data which is guiding 
and confirming our design choices. One observation we made with our simulation thus far 
is that purely static backends that drive network simulations can be inaccurate. Measured 
maximum skews between the trace generated streams of various processors and those during 
network simulations were over a milion references for a total simulation reference length of 
20 million references! 


A VLSI implementation of a circuit-switched network chip is in progress under the direction 


of Tom Knight [101]. Knight is also investigating high-density 3-D button-board packaging 
for the interconnection network. 
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4.4.4 Synchronization 


We developed a new technique for efficient synchronization called adaptive backoff syn- 
chronization '6|. A purely software approach, adaptive backoff synchronization helps re- 
duce network contention due to fine-grain synchronization accesses across a network. Our 
technique can also help reduce hot-spot contention in large scale networks without resort- 
ing to hardware-intensive solutions like combining networks [261] or global synchronization 
logic {164}. We are also investigating the application of these adaptive backoff schemes to 
reduce contention in source-responsible circuit-switched networks. 


Our adaptive backoff methods use a synchronization state to reduce polling of synchroniza- 
tion variables. Our simulations show that when the number of processors participating in a 
barrier synchronization is small compared to the time of arrival of the processors, reductions 
ci Love to over 95% in synchronization traffic can be achieved at no extra cost. In other 
situations, adaptive backoff techniques result in a tradeoff between reduced network accesses 


and increased processor idle time. 


We are also studying software combining [316] to determine the extent to which a directory 
cache coherence scheme can efficiently support fine-grain barrier synchronization. By using 
the postmortem scheduler for FORTRAN traces, along with some additional postprocessing 
software to simulate the effect of software barrier trees, Kiyoshi Kurihara is investigating 
methods to reduce synchronization costs in cache-coherent multiprocessors. The postpro- 
cessing program locates and changes spin-lock addresses to simulate a combining tree effect. 
Applications to both static and dynamically created barriers are being studied. To obtain 
results from the MACH tracing package, Kiyoshi is modifying the barrier macros to use 
combining trees and adaptive backoff methods. 


4.4.5 Processor Design 


We are investigating novel VLSI processor architectures for large scale multiprocessor sys- 
tems. A processor called AF RIL is being designed by Beng-Hong Lim and Dan Nuss- 
baum /207). This processor borrows heavily from the MARCH processor design by Bert 
Halstead and the Stanford MIPS-X processor [163], but differs substantially from the two. 
Unlike MARCH, APRIL has hardware interlocks in the pipeline, does not interleave process 
threads, and uses software thread scheduling. Unlike MIPS-X, it allows multiple hardware 
contexts, and has hardware support for synchronization and Futures. The chief issues being 
addressed in this design are rapid context switching, fast trap handling, high single thread 
performance, hardware support for synchronization and futures, and register file organiza- 


tion. 


An important result of our study has been identifying the specific hardware-software trade- 
offs for achieving overall high system performance. Some examples include hardware versus 
software for fine-grain task management and scheduling in a multithreaded processor, and 
hardware provided synchronization primitives such as fetch-and-op versus software synthe- 
sized primitives from basic interlocked load/store instructions. We currently have a pre- 
liminary instruction-set specification. A Mul-T compiler for this processor and a detailed 
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simulator are also being written. Beng-Hong Lim has written an instruction-level simulator 
that recently ran the ubiquitous Fibonacci program. 


4.4.6 Parallel Processing Software 


David Kranz’s work centered around Mul-T [189]. Mul-T is a parallel Lisp system, based 
on Multuisp’s future construct [151], that was developed to run on an Encore Multimax 
multiprocessor. Mul-T is an extended version of the Yale T system [270][271] and uses the 
T system’s ORBIT compiler [188] to achieve “production quality” performance on stock 
hardware—about 100 times faster than Multilisp. Mul-T shows that Futures can be imple- 
mented cheaply enough to be useful in a production-quality system. Mul-T is fully opera- 
tional, including a user interface that supports managing groups of parallel tasks. People 
at other universities, labs, and companies are using Mul-T, and useful feedback is expected. 
(See [189] and the Parallel Processing Group report for more details.) 


Mul-T is useful as a real system for parallel programming but suffers because it is difficult 
to do performance evaluation. We also do not want to limit ourselves to bus-based multi- 
processors such as the Encore Multimax. For large scale multiprocessors it will be necessary 
to examine the effects of locality on performance. In order to get the data necessary to 
investigate these issues, David Kranz re-engineered Mul-T to get T-Mul-T described earlier 
T-Mul-T runs on an arbitrary number of processors independent of the number available in 
the host multiprocessor. 


In collaboration with Susan Owicki, DEC Systems Research Laboratory, Palo Alto, we are 
investigating affinity-based process scheduling techniques for improving the locality of mem- 
ory referencing in multiprocessors. This work uses analytical models of the performance 
of multiprogrammed single processor caches [8]. An analytical model of performance of 
multiprocessor caches has also been derived to be used in this study [251]. 


A continuing effort is the development of large parallel applications. A substantial bench- 
mark program, SIMPLE, is being parallelized and ported to Mul-T, a parallel dialect of Lisp. 
It is a finite-difference numerical analysis program from the Lawrence Livermore Lab, which 
has become one of the standard benchmarks for evaluating existing and proposed high per- 
formance computers. The parallelization can be conditionally compiled to efficiently target 
a wide scale of multiprocessors (from 16 to 100’s of processors.) Other programs already 
developed for Mul-T include parallel matrix multiply, Permute, and Modsim. 


4.4.7 Multiprocessor Locality Studies 


Caches can prove beneficial in large scale multiprocessor environments only if we can exploit 
locality in multiprocessor memory referencing to a much greater extent than we have been 
able thus far. Our measurement studies of parallel application traces confirm this need. Our 
efforts in this direction are summarized next. 


Ongoing work aims at providing an integrated strategy to implement an efficient storage 
hierarchy in shared-memory multiprocessors. Currently, parallel traces and application pro- 
grams are being used, along with the simulation tools, to analyze and characterize localit‘ 
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properties in the memory access traffic. This is preparatory work with the goal to explore 
ways to exploit memory access locality to reduce access latency and interconnection band- 
width requirements. 


We have a new model representing memory referencing locality in multiprocessor systems [7]. 
This locality model suitable for multiprocessor cache evaluation is derived by viewing mem- 
ory references as streams of processor identifiers directed at specific cache/mem..y blocks. 
This viewpoint differs from the traditional uniprocessor approach tier uses streams of ad- 
dresses to different blocks emanating from specific processors. Our view is based on the 
intuition that cache coherence traffic in multiprocessors is largely determined by the number 
of processors accessing a location, the frequency with which they access the location, and the 
sequence in which their accesses occur. The specific locations accessed by each processor, 
the time order of access to different locations, and the size of the working set play a smaller 
role in determining the cache coherence traffic, although they still influence intrinsic cache 
performance. Gino Maa has some initial results that show that these processor references 
directed to a memory block display the LRU stack property. If we succeed in showing this 
is indeed true across a large set of parallel applications, then the abundant literature on 
LRU stack evaluation for single processors can be straightforwardly used in evaluation of 
multiprocessor performance. 


4.4.8 Multiprocessor Performance Modeling and Evaluation 


Analytical models of computer performance become ever more important as we scale mul- 
tiprocessors to hundreds or thousands of processors, where the computational needs of sim- 
ulations far exceed those available to us now. In addition to the simulation and analytical 
modeling systems described in the previous sections, we developed the following performance 
evaluation models. 


We developed a model of multiprocessor cache performance when coherence is enforced by the 
software [251]. A similar model for the performance of hardware-enforced cache coherence 
that takes into account the effects of the increase in invalidations as more processors are 
added is being developed in collaboration with Susan Owicki. 


We have implemented several analytical models of network performance to predict effec- 
tive processor utilization taking into account delays due to cache and network accesses and 
contention. The models are driven with access rates and access sizes measured from our 
benchmark iraces. These models allow quick estimation of performance measures for vari- 
ous network configurations and numbers of processors. 


Minor Huffman extended our trace compaction technique based on a model of the spatial lo- 
cality in programs |5| to improve beth the compaction rate and cache performance simulation 
accuracy 165. 
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sors. Lecture given at MIT Laboratory for Computer Science, March 1988. 
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S. Ward. The L project. Lecture given at NeXT, Inc., Palo Alto, CA, November 1988. 
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5.1 Introduction and Overview 


Our group is interested in general purpose parallel computation. Our approach is centered 
on: 


e Declarative, implicitly parallel languages. 


e Dataflow architectures, which are scalable because of their tolerance of increased mem- 
ory latencies and support for frequent synchronization. Our vehicles for research in- 
clude an abstract “Explicit Token Store” architecture (ETS), a hardware prototype 
implementation of ETS (Monsoon), various software emulators (Gita, MINT), a new 
proposed architecture called P-RISC, and a software emulator for it. 


e Sophisticated compiling and runtime systems for Id, both for dataflow and other ar- 
chitectures. We have also explored the use of dataflow compiling for an experimental 
persistent programming language to tolerate disk latencies by exploiting parallelism. 


e Applications programs to guide the language, compiler, and architecture research. 


Our main research vehicle for programming languages is id, which is a mostly functional 
programming language. We completed the basic type system and are exploring the use of a 
simplified version of a new overloading mechanism due to Phil Wadler [306]. Id is a nonstrict 
language for more parallelism, but nonstrictness is not achieved via laziness, as is usually 
the case. Instead, we have explored the implications of using explicit constructs for lazy 
evaluation to deal with infinite structures. For nondeterministic access to shared state, we 
have developed a new construct called a “manager” that is similar to, but more flexible than 
monitors and also allows more concurrency. We have also explored a few other experimental 
language designs: a language with naming environments as first class objects, and a language 
for signal processing. Our group is well represented in the international committee that is 
designing the new functional programming language Haskell. 


On the more theoretical side, we have formalized Id’s operational semantics using rewrite 
rules, and have been able to prove results about determinacy and to be more precise about 
such concepts as termination, errors, etc. We have also studied optimal interpreters for the 
lambda-calculus. 


We have ported a subset of Id World, our programming environment for Id, to the UNIX 
envionment. This should make Id available to a much larger audience. The UNIX version 
tacks the graphies of the original Lisp Machine version; this work remains to be done. 


Last year, we reported that our research results had reached a level of maturity where we 
were ready to embark on the construction of a real dataflow machine within the next few 
years using the Monsoon processor architecture (Project Dataflow). Towards that end, we 
held a meeting in March 1988 with prospective industrial partners. Over the last year, 
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Motorola has emerged as our partner; they are setting up a Cambridge research laboratory, 
and will participate actively in the construction of the Monsoon system. 


A wire-wrap prototype of a processor using the Monsoon dataflow architecture has been 
running small handcoded programs since September 1988, and has compiled code since 
December. It has been used to guide the design of the printed-circuit Monsoon board (part 
of Project Dataflow). We continued to make progress on the design and implementation of 
the Monsoon interconnection network, consisting of PaRC switching chips and high speed 
data links. We have begun work on the design of an I-structure memory board for Monsoon. 


We have incorporated more optimizations in the Id compiler, and are moving its target away 
from the Tagged Token Dataflow Architecture to an Explicit Token Store model (ETS), 
of which Monsoon can be considered a specific implementation. We began to look very 
seriously at the runtime system and the control of parallelism in Id programs for better 
resource management, and have implemented several experimental mechanisms to that end. 


Our repertoire of Id applications contiuues to grow and includes DNA sequence analysis, 
airport landing approach planning, computational fluid dynamics, image processing, and 
simulated annealing. 


Our architecture research has also moved further in the direction of achieving a synthesis 
between von Neumann and dataflow ideas. We proposed a new architecture called P-RISC 
(for “Parallel RISC”), and have begun simulation and compilation studies. 


Based on the I-structure notation in Id, we have designed a “functional database language,” 
in which data do not change—update transactions specify new versions of a database. We 
are implementing this database language, using ideas from P-RISC compilation to exploit 
parallelism to hide disk latencies. 


5.2 Personnel 


After finishing his Ph.D. thesis in August 1988, Greg Papadopoulos became a member of 
research staff, working as the chief architect for the Monsoon prototype processor in Project 
Dataflow. 


After completing his Ph.D. with Paul Hudak at Yale, Jonathan Young joined us as a member 
of the research staff, working on the compiler backend and runtime system for Monsoon. His 
research is in compile-time semantic analysis and optimization of functional programs. 


Paul Johnson joined our research staff and has been working on porting the existing Id World 
to UNIX machines. 


Arthur Altman joined CSG in January 1959 as a visiting researcher from Texas Instruments. 


to study the dataflow approach to progremming languages and architectures when applied 
to problems in image understanding. 
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After finishing his Ph.D. in May 1988, Ken Traub stayed on as a research staff member. 
In early 1989, he joined the Cambridge Research Center of Motorola, Inc., the industrial 
partner on the Monsoon project. 


It is with great sadness that we record the passing of Bhaskar Guha Roy on March 23, 1989. 
He worked first with Jack Dennis and later with Prof. Nikhil. He fought an incredibly 
courageous, year-long battle against liver cancer, during which he managed to write his 
Ph.D. thesis proposal and set up his committee. 


5.3 Programming Languages 


5.3.1 Id 


In September, we released the reference manual for Version 88.1 of the Id programming 
language (244, which augmented the language with constructs for loop bounding. 


5.3.2 Types and Overloading 


During the Summer and fall of 1988, Shail Aditya revised and upgraded the type checking 
system of the Id compiler to incorporate changes from Id’87 to Id’88. This involved the 
addition of several key features to type analysis, viz., algebraic data types, constructor case 
analysis and abstract data types. Further, the type checker was made totally incremental at 
the procedural level. Thus, in the version currently installed, the user can compile individual 
procedures interactively from the editor, in any order. The type checker, installed as a 
module in the Id compiler, incrementally assembles enough information to check the type 
consistency of the accumulated program at each interactive step. Using this information, 
the runtime environment is able to double check the type consistency of all the procedures 
in the invocation graph just before execution. The user is notified in case of any discrepancy 
and the appropriate section of the program can be corrected and recompiled. 


During the winter and spring of 1989, Shail Aditya worked on a mechanism for the resolution 
and compilation of overloaded of operators and general user-defined identifiers. The idea is 
a simplification of the system proposed by Wadler and Blott [306] which has been adopted 
in Haskell. Unlike previous overloading schemes, this one is not ad hoc. It is capable 
of expressing “recursive overloading”, e.g., if “+” is already overloaded on integers and 
floats, then it can also be overloaded to mean addition of lists of integers and floats and, 
inductively, on lists of lists of integers and floats, etc. There is a systematic way of resolving 


this overloading. 


The type checker with overloading resolution is currently under test with regards to efh- 
ciency of compilation and execution. We are conducting experimental tests with existing 
Id programs including large scientific codes such as SIMPLE. It will be installed in the Id 
compiler in the near future. The proof of consistency ofthe incremental type system and the 
details of the overloading mechanism are due to appear in Shail’s forthcoming S.M. thesis. 
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The straightforward resolution of overloading results in some inefficiency because a procedure 
that uses the symbol “+” is implemented as one that receives an addition function as a 
parameter, which is applied using a general function call. It remains to be seen how this 
can be optimized through a process called “specialization”, where separate versions of the 


procedure are compiled, one for each implementation of “+” that is of interest. 


5.3.3 Lazy Evaluation 


Id has nonstrict semantics, which means that a procedure or data constructor application can 
produce a value before the value of its arguments are known. Traditionally, languages with 
nonstrict semantics have been implemented using lazy evaluation, where nothing is evaluated 
untilit is known that it is needed for the result. Unfortunately, when an expression is needed, 
a lazy evaluator would have already paid the overhead of building a closure for the expression 
and rescheduling it. Further, it would have lost the opportunity of evaluating it concurrently 
with other computations. For these reasons, we choose not to use lazy evaluation in Id. 


However, lazy evaluation can be very useful for programming with infinite structures (e.g., 
streams), and for large daia structures of which only a small part is actually used. Steve 
Heller completed his Ph.D. thesis in January 1989, in which he investigated the design, 
use and implementation of explicitly designated lazy data structures in Id [158]. Heller 
and Jamey Hicks implemented lazy data structures in the graph interpreter (Gita) based 
on some preliminary work of UROP student Chuck Fabian. He was able to show that of 
the numerous examples of applications that used lazy evaluation in the literature, most of 
them needed only nonstrictness, not laziness. The few instances where laziness was actually 
necessary were easy to identify, and it was quite easy to use the explicit lazy data structures 
in Id. Jonathan Young and Hicks implemented a restricted version of lazy data structures 
on Monsoon (four states instead of five states in the state diagram, since Monsoon only has 
two status bits). Lazy data structures are being used to implement global constants and for 
streain programıning, and have also been used in system code for memory allocation. 


5.3.4 Managers 


Paul Barth continued his research on managers, a construct for supporting nondeterministic 
computation in Id. Nondeterministic constructs are needed for state-sensitive computation, 
including “application” programs, such as real time systems and database systems that 
respond to multiple inputs according to their temporal order. They are also necessary 
for “systems” programs, such as runtime support for the implementation of a functional 
language, which need to manipulate the state of the machine. 


The manager construct was redesigned to facilitate programming abstraction and efficient 
implementation. Rather than stream: functions, managers have been recast as abstract data 
types, with operators that access and update a shared state. This is beneficial from twe 
standpoints. As a programming construct, this makes the nondeterminism explicit while 
encapsulating the state transformation. Each potentially nondeterministic operator is easily 
identified, and can be written as a function from old state to new state. 
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Managers are similar to monitors, but allow much more flexibility in scheduling the queues of 
waiting processes, and allow much more concurrency between state-manipulating procedures. 


From an efficiency point of view, the new paradigm allows mutual exclusion to be provided 
by hardware primitives rath-r than stream operations. These primitives, called locks, are 
an extension of I-structure operations that provide efficient mutual exclusion on individual 
memory cells. The design of locks (developed jointly by Barth, Soley, and Steele) is currently 
being filed for patent. The new manager construct is fully described in CSG Memo 294. 


Managers were incorporated into the compiler, and applications were developed, including 
the dining philosophers problem, a shared bank account (with deferred debits), a printer 
scheduler, a buddy system memory allocator, and a union-find set algorithm. These exam- 
ples indicated that the new design was more perspicuous and efficient than stream-based 
managers. 


5.3.5 Other Language-related Work 


Sequential Implementations of Nonstrictness 


Ken Traub’s work on sequential implementation of nonstrict programming languages has 
continued, resulting in a paper presented at the Aspenäs Workshop on the Implementation 
of Lazy Functional Languages in Göteborg, Sweden. The paper is also to be presented at 
the 1989 Conference on Functional Programming Languages and Computer Architecture in 
London. 


Symmetric Lisp 


Suresh Jagannathan completed his Ph.D. thesis [169] on Symmetric Lisp, a novel parallel 
programming language in which naming environments (called maps) are first class objects. 
Through numerous programming examples, he was able to show that many diverse pro- 
gramming paradigms and constructs from other languages can be expressed quite elegantly 
with just the map construct. Examples include records, LET and LETREC blocks, “object- 
oriented” programs, file systems and directories, etc. 


Using a single construct (the map), both as a data structure as well as a control structure, 
raises some interesting questions about formal properties of programs, because names are 
used both as program variables and as field selectors. For example, in the expression: 


(with M e) 


a free name x in eis looked up in M, ifMis a map with a field x; otherwise, it is looked up in the 
surrounding lexical environment. Jagannathan developed an inference algorithm to produce 
statically a conservative approximation that predicted - ‘ch environment a name would be 
looked up in A compiler could use this information tor efficient compiling name lookup 
efficiently. He also showed an implementation of Symmetric Lisp in terms of a translation 


to dataHow graphs. 
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Optimal Interpreters for Lambda Calculus 


Vinod Kathail continued his investigation of optimal interpreters for the A-calculus and 
functional languages based on the \-calculus. The work in the last year focused on two 
aspects of the interpreter we had developed: formally proving its correctness and optimality, 
and simplifying its exposition. To relate our interpreter to the A-calculus, we developed a 
new term calculus, which captures some of the essential features of the way the substitution 
operation of the \-calculus is implemented in our interpreter. The term calculus is used as 
an intermediate step in proving the correctness of our interpreter; however, it may be of 
interest in its own right. We are in the process of completing the formal proofs [175]. 


PGL, A Signal Processing Language 


Janice Onanian completed a Master’s thesis in spring 1989, in which she developed a high 
level, signal processing language, called PGL, and a program graph representation for coarse- 
grain multiprocessors. Effective use of parallel processors requires dividing an application 
into concurrently executable tasks and assigning those tasks to processors such that their 
use of the network resources is optimized. We plan to use the language and graph devel 
oped in the thesis to find an optimal partitioning of an application into parallel tasks for a 
given hardware configuration. This involves two efforts: the development of algorithms for 
evaluating a task partition dc..oted by the program graph; and finding the optimal partition 
by varying the parameters to the program graph. Implementation of the PGL compiler is 
targeted for summer 1989; and development of the evaluation and optimization algorithms 
is planned to form the basis for subsequent, doctoral research. 


Haskell, A New Functional Programming Language 


Arvind and Nikhil have continued to participate in the design of the new functional program- 
ming language, Haskell. As reported last year, Haskell is being designed by a group of about 
20 functional programming researchers from three continents. A draft of the language report 
was released to the public for comments in December 1988, which was followed by extensive 
discussion on the FP (functional programming) mailing list. The Haskell committee then 
met again in Mystic, CT in May 1989, where we charted the design decisions and actions to 
be taken before the final report is released in July 1989. 


5.4 Id World: The Id Programming Environment 


During the fall, R. Paul Johnson implemented a suite of interface functions designed by 
Richard Soley for Gita, the graph interpreter. This suite of functions, known as the Id 
World Interface (IWI) will support a variety of Id World user interfaces. Id World Versior: 
4.0 and earlier only provided a Lisp Machine-specific graphical interface. Id World Version 4.' 
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includes a portable Common Lisp-based command listener. An X Window-based interface 
is under development. With the assistance of Jamey Hicks, Johnson released version 4.0 
for internal testing in early December. Version 4.0, with support for Symbolics Genera 
7.1/7.2 and TI Explorer 3.2/4.1, was shipped in January. Highlights of Version 4.0 include 
an optimizing compiler for Id 88.1, Id Mode Zmacs editor support, and the Gita graph 
interpreter with support for top level constants. Version 4.1, which adds support for Lucid 
Common Lisp Version 3 on Sun Workstations, was released externally for beta test in late 
March. 


The next version of Id World will have greater separation between modules than in the 
current version, so that each piece may be run separately in a UNIX environment as opposed 
to being tied to the Lisp Machine implementation. In addition, Hicks has been meticulously 
documenting the internals of the runtime managers and the compiler schemata used in the 
current system, as well as some of the desirable hacks on the new hardware. 


5.5 Project Dataflow: The Monsoon Prototype System 


5.5.1 The Monsoon Processing Element 


A very exciting milestone was met in September of 1988 when a single processor Monsoon 
prototype was made operational, able to execute incrementally compiled Id88 programs. The 
prototype implementation was engineered by Jack Costanza and Ralph Tiberio in compliance 
with the Monsoon microarchitecture specification developed by Greg Papadopoulos [253]. 


The Monsoon prototype is a 64-bit, fully pipelined (eight stages) dataflow processor. Con- 
structed from off-the-shelf components on a single large wire wrap panel (9U x 600mm), the 
processor processes a modest four million tokens per second or approximately three dataflow 
MIPS of which any proportion can be double precision floating point. The processor board 
is enclosed in a custom cabinet with suitable power supply and cooling, and then connected 
via ribbon cables to a simple NuBus interface card hosted in a Texas Instruments Explorer 
Lisp Machine. 


Hardware verification and debugging was facilitated by two design disciplines. First, we 
performed thorough timing simulations of entire board on our Mentor design tools. Dur- 
ing simulation we executed smail dataflow graphs to verify overall operation and focused 
specifically on various matching operations token enqueuing sequences. The second design 
discipline was to employ scan paths for (almost) all internal state. In scan path design, each 
parallel register can have its contents read and written through a special serial path, and 
multiples of such registers have their serial paths concatenated and then looped back to form 
a large scan ring. Any bit of processor state can be accessed by shifting these serial registers. 
Finally, the scan rings can be read and written through NuBus operations performed by the 
host Lisp Machine. 


The prototype processor comprises over 800 bits of scannable state. Software on the host Lisp 
Machine interprets and displays the nrocessor state in a full screen format, with appropriate 
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data conversions (e.g., floating point) and mnemonics (e.g., opcodes, field decodings). The 
prototype processor clock can also be single stepped under host control, and by repeatedly 
stepping the clock and scanning state a full suite of software breakpoint conditions can be 
established. In essence, we used the combination of scan path design and host software 
to develop a sophisticated in-system logic analyzer. We found this to be a very effective 
debugging technique. 


The Monsoon prototype forms the basis for the production Monsoon processor, a printed cir- 
cuit board version to be manufactured by Motorola. Several improvements are incorporated 
in the production version. 


e A network port based on the PaRC and link chips is added to permit the construction 
of multipie processor systems. 


e A set of exception mechanisms and more complete support for system programs (e.g., 
loader, garbage collector) have been designed. 


e The host interface has been changed from NuBus to VME and a high bandwidth DMA 
path has been added from the host into Monsoon frame store. 


e The instruction format has been changed slightly to permit a wider opcode field (from 
10 bits present to 12 bits) and variant formats are introduced that allow either two 
explicit destinations or a large absolute address displacement (20 bits). 


e Much of the datapath has been byte sliced into 10,000 gate CMOS arrays (eight iden- 
tical slices) and the specialized ALU functions that manipulates tags (the Pointer 
Increment Unit) has been cast by George Wang into a similar sized array. 


e The pipeline rate has increased to ten million tokens per second, approximately seven 
million dataflow instructions per second. 


e The board size has been reduced from 9U x 600mm to 9U x 400mm (“Sun size”) 
through the use of gate arrays and surface mount assembly. 


The production processor is in the final detailed design and simulation phase. We expect to 
hand off the design to Motorola by June 1989. 


5.5.2 The Interconnection Network for Monsoon 


Andy Boughton, Chris Joerg, and John Santoro continued their work on the network for 
Monsoon. We have contiaued to develop the two chips that will be used in the network, the 
Packet Routing Chip (PaRC) and the Data Link Chip (DLC). PaRC is a four input four 
output packet router on a chip and is the primary component of the Monsoon network. DLC 
contains a data link transmitter and a data link receiver. The transmitter will allow a PaRC 
outpui port to be connected to an interboard cable and the receiver will allow an interboard 
cable to be connected to a PaRC input port. 
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Joerg has continued the development of PaRC; the design has not changed significantly cver 
the past year. Some work has been done to enhance the statistic collection abilities. Also, 
some improvements were made to the control port of PaRC. The control port is the section 
that allows a local controller to control several parameters of the chip’s operation (such as 
how to do routing and what to do when errors are seen). Most of the work done on PaRC 
has involved creating test vectors. These vectors will be used to ensure that fabricated chips 
do not contain any functional defe ts. 


Santoro continued the development of DLC. During the past year, we have used the prelim- 
inary logic design comnleted last year to develop a detailed design for DLC in Motorola’s 
Mosaic I] ECL gate array technology. 


The top level design of DLC has changed somewhat during the year. The primary change 
was the elimination of 4 into 6 encoding. Our original design called for the encoding of all 
data transmitted over interboard cables. The primary advantage of this encoding was the 
elimination of the DC component of the transmitted signal. However, encoding required 
that the DLC be designed to operate on a 50% faster clock. Designing DLC for such a clock 
turned out to be a fairly difficult task. Faced with this difficulty, we ran a large number 
of tests on our proposed drivers, receivers, and cable to determine whether data could be 
reliably transmitted without encoding. Our tests indicated that a data pattern containing 
an arbitrarily long sequence of 0’s followed by a 1 and another long sequence of 0’s could 
be transmitted over the cable with more than sufficient noise immunity. Our tests indicated 
that the inverse pattern also worked. Based on these tests we elected to simplify design of 
the DLC by removing encoding. 


The detailed design of DLC has been completed and simulated. Test vectors have been 
written which are sufficient for testing fabricated chips for faults. A preliminary version of 
the design has been transferred to Motorola. The final version of the design should be given 
to Motorola before June 30, 1989. 


5.5.3 The I-structure Memory Board 


During the spring, Ken Steele began work on a hardware I-structure controller design that 
will implement [-structures and the new memory operations developed for the Monsoon 
prototype. Each board is expected to provide 4MW (64 bits/word) per board, and be 
capable of handling up to five million requests per second through an onboard PaRC chip. 


5.5.4 MINT: a Monsoon Simulator 


Andy Shaw and Jonathan Young implemented a simulator for the Monsoon architecture 
which proved to be an invaluable tool for debugging the hardware. For his S.B. thesis, 
Shaw then extended this project into a complete interpreter that is capable of mimicking 
the hardware with great precision. The intent is that any object code that runs on Monsoon 
will run withont modification on MINT. The design is very modular, and uses the Monsoon 
microcode compiler described below. 
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Since we wish to accurately simulate Che processor, regardless of its current microcode, a 
microcode to Common Lisp compiler vas designed and coded by Derek Chiou. The compiles 
accepts Monsoon microcode and tran. ates it into Common Lisp comparable to hand code 
in efficiency. Secondary coustderations were human readability and code size. ‘Thus, the 
identical microcode specification used to drive the actual hardware is also compiled for the 
situtlator, with obvious benefits of hardware stmulator consistency. The compiler is flexible 
enough to adapt to any foreseeable microcode changes. ‘Phe compiler is written in Common 


Lisp. 


5.6 Implementations of Id 


5.6.1 tdon Monsoon 


Jonathan Young and Jamey Hicks spent most of their time this year porting the existing 
Id compiler to Monsoon (with some initial work by Bradley Kuszmaul). This enables us to 
run real programs on the Monsoon wire wrap prototype. We now have a working Monsoon 
compiler, as well as a loader, a runtime system, an execution manager, and a rudimentary 


debugser; the standard Lbraries have also been ported. 


While much of the work of porting the compiler was easy because the Monsoon ETS ar 


ehiteeture strongly resembles the previous PEDA architecture, the runtime system required 


major work. ‘The Gita simulator relied on the storage management of the Lisp Machines: 


on Monsoou we tinplemented handeoded managers for free lists, frames for procedure calls. 


aad two different heaps. We also implemented managers for I structures and semaphore: 


(“locks”) to tide us over until we have a working Estructure memory board. In addition. 


special managers were needed to support particular language features such as delays and 


accumulators. 


Dang the execution manager, the user may now call any Id function which has been compiled 
and loaded into Monsoon with as many arguments as desired. Bxecution is currently limited 
to either “ran until done" ora general single stepper in which all eight stages of the Monsoor 
proces or pipeline are visible. Alter a program error has been detected, varıons tools ailos 


the user to view wanting, tokens, data structures, and instruction memory. 


Barth and Young developed a graph browser, and Doug Stetson improved the display heuris 


fies. Ihe browser proved to be very useful for debugeing both the compiler and dd programs, 


and at has been installed in the Monsoon system. When compiling, the graph of a procedure 
pooptionally di played; tas abo pow ble fo view the graph ofa procedure wath its waitin» 


tokens after a partial execution on Monsoon, 


3.6.2 Storage Management 


Ken Steele wrote imierecode for the proc cor prototype to csaubate Datenet pre an the p: 


cessor: mersott. Afo, new ume trueton were created to “apport the runtuae en dronn et 
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and compiler. These included non-busy waiting locks and support for lazy evaluation. A 
patent has been applied for on the non-busy waiting locking mechanism. 


A storage management system was implemented in Id for dynamic allocation and dealloca- 
tion of structure memory and frame memory. Stephen Brobst implemented a buddy system 
algorithin using nondeterministic lock and unlock primitives, which were provided as exten- 
sions of the Id language as a result of work done by Barth. Multiple instantiations of the 
allocation and deallocation routines can proceed in parallel, with suspension occurring only 
when two allocations attempt to allocate blocks of the same size. Fast path execution of 
the memory allocation routine requires less than 50 RISC-like instructions for its critical 
path. Young ported the buddy system to the Monsoon Architecture and augmented the 
storage management systern with stack-based allocation mechenisms for cons cell and fixed- 
size frame memory allocation. Brobst has also written an Id version of the first-fit algorithm 
and is experiinenting with various granularities for free list management. 


A first version of the Id runtime system has been specified and is now under implementation. 
The storage management system will leverage the work of Barth, Brobst, and Young to 
provide dynamic allocation of structure storage and frames for large codeblocks using the 
buddy systeru, and in-line stack allocation of mei :ory for cons cells and fixed-size frames. 
The I/O subsystem will provide a primitive interface to the file system using string objects 
and standard system call interfaces for file open, close, read, and write. Extensions to the 
Id language for synchronizing multiple reads and writes to a single file are an active area of 
research. 


5.6.3 Long Term Software Structure 


The above retargeting of Id for Monsoon uses the TTDA code that is produced from the 
existing backend of the compiler. This is not an attractive route in the long term. Young 
has written a sperification of the ETS abstract machine [317] for use in compiling to the 
Monsoon architecture as it slowly evolves. 


Traub has designed the architecture of the software system which will support Monsoon, to 
be jointly implemented at MIT and at Motorola Cambridge. The greatest difference between 
the new software system and the old TTDA/Gita system is one of modularity. Whereas the 
functions of loading Id programs, running them, debugging them, and displaying runtime 
statistics were previcusly all handled by the Gita program, in the architecture each of these 


functions wil be handled by separate programs, with a top level program provided to present 
the aser with essentially the same programming environment as found in the current Id 
World. The soonlting systern will be much more robust and flexible, and will point the way 
forthe eveatial migration of these functions onto the dataflow processor itself. Perhaps even 
tore iiepertant!« these programs are designed to work both with Monsoon hardware and its 
software enodation (MiN TO. Local area networks are an important part ot the new system, 
both ip the nee of X Windows as the framework for the user interface and in providing a 
petavork meth te the Mon-oon hardware or emulator. This will allow for easy sharing of 
a Mo “or among several users. The software architecture and all the interfaces 


betwee he otapenent are snoronglhlv documented in 1301), edited by Tranh. 
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Hicks and Traub designed the Monsoon Object Code (MOC) format. This is the format in 
which the Id compiler (and other programs) will write object files. MOC is based on CIOBL 


(Common Input/Output Base Language), redesigned by Traub. 


Hicks had also made an initial design of the fd Object Format. The Id Object) Format 
describes the data structures that will be loaded for an Id program. There will be structures 
for each procedure, global constant, and code block compiled and loaded. These structures 
will hold computed values and program code, and will support dynamic linking. They will 
also have source information for use in debugging. All program information that is needed 
at runtime will be structured using the Id Object Format. The actual object files will be 
encoded into MOC. 


5.6.4 Experiments with Structure-storage Management 


Jamey Hicks extended the Id compiler to handle data structure release annotations. ‘This 
allows us to deallocate data structures relatively painlessly, but it is not meant to be a 
language feature that users will employ. It is meant to be an experimental feature, so that 
we can compare the performance of hand-annotated programs with that of automatically. 


annotated programs. 


The syntax of the annotation is: 


Orelerse IDENTIFIER; 


release IDENTIFIER. (0), IDENTIFIER_1, ... IDENTIFIER n; 


inside a block expression. This annotation specifies the release of the structure bound to 
IDENTIFIER , when all computation enclosed within the block expression has terminated. 
This only releases the storage corresponding to the top level of the structure; the compiler 
cannot determine how much sharing of substructures there arc in the program, so it does 
not release them. Phe compiler inserts the synchronization code necessary to ensure that 


the object is not released until all of the code in the block has terminated computation. 


Inside a loop, ORELEASE actual, has two meanings: ifthe structure ts not circulated, then it is 
released when the current iteration has terminated; otherwise, if the strveture is circulated 


in the loop, then ait releases all but the first and last values of the structure when the 


corresponding: iteration has terminated. The release of circulating structures is accomplished 


hy vnrolhing the loop once, and not releasing the struetare in the initial execution of the 


body 
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Here is an example of the @RELEASE annotation in the multiwave procedure: 


AAA Run several iterations of the wavefront. 
“2% Illustrates the arbitrary chaining achievable in dataflow. 
“i, eee Release the intermediate waves when through with them. 
def multiwave edge vector n = 
{m ~ initial_wave edge vector; 
in 
{for i <- 1 to n do 
next m - wave m; 
Grelease m; 
finally m }}; 


Young has written a simple compile-time analysis program which determines when it is safe 
to deallocate structures in loops; deallocation annotations are then automatically added to 


the program. 
5.6.5 Resource Management in Scientific Programs 


David Culler made substantial progress this year toward effective management of parallelism 
and resources in dataflow programs. The problem is that exploiting parallelism to achieve 
high performance invariably increases the resource requirements of a program. This phe- 
nomenon is not particular to dataflow, it can be observed to some degree in any form of 
parallel execution. However, it is particularly serious under dynamic dataflow execution, 
because all the potential parallelism in a program is exposed. This means that ample paral- 
lelisın is available on a broad class of programs but, unfortunately, the resource requirements 
of many programs are excessive, often leading to deadlock. Culler documented both sides 
of this dilemma using parallelism and resource profiles of a variety of scientific programs 
derived under an ideal dataflow execution model (supported by Gita). 


In 1985, he developed a mechanism for controlling parallelism, called k-bounded loops. Basi- 
caliy, loops are compiled into dataflow graphs in a manner that allows the maximum number 
of concurrent iterations to be set dynamically, when the loop is invoked. This approach is 
appealing for scientific programs, which are dominated by iterative computations over large, 
regular data structures. It has played a central role in the evolution of tagged token dataflow 
architectures toward Explieit Token-Store machines and hybrid machines because it allows 
the tag space fo be used densely. Also, it provides a natural means of reusing resources 
within iterative computations. Phe question he has been exploring recently is how to assign 


thek bonnd- automatically. 


The approach Culler has taken is to rely heavily on static analysis to characterize the dy 
namie behavior of progeamis There are two aspects of this analysis: worst case resource 
reqmirements and expected parallelism. A representation of the dynamic call structure of 
the program is constructed and annotated with symbolic resource expressions which are 
parametrre in the Æ bounds and in certain program variables. In addition, loops are classi- 
fied as having hrated useful unfolding, expensive unfolding, and efficient unfolding. Based 


on this analysis, the program is augmented with resource management code that computes 
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the A-bounds by simple formulae derived froin the resource expressions that capture a high 
level policy, e.g., favor the middle level in this triply-nested loop. A variety of policies have 
been examined analytically and empirically, and a particular policy has been effective in con- 
taining the resource requirements of scientific dataflow programs, while exposing adequate 


parallelism. 


This work forms the beginning of a bridge between our research in dataflow execution and the 
work in parallel execution of FORTRAN. In our case, the problem is to constrain potential 
parallelisin that is not cost effective to exploit. In the FORTRAN case, the problem is to 
determine where it is most cost-effective to uncover parallelism. We will never reach exactly 
the same place, because our analysis must err in the direction assuming two computations 
cannet be serialized, while theirs must errin the direction of assuming two computations 


cannot execute in parallel. Still, we expect there will be a valuable cross-fertilization. 
5.6.6 Speculat:ve Parallelism 


Richard Soley completed his Ph.D. thesis work this year on the control of speculative paral. 
lelism in ld programs, under the abstract tagged token dataflow execution model. Although 
resource control models for exploiting the parallelism in large scientific codes have been 
recently explored, no approach to exploiting speculative, searching parallelism has been ex 
plored, even though (or peinaps because) the potential parallelism of such applications is 
tremendous. Soley explores a view of speculation as a process which may proceed in parallel 
in a controlled fashion, using examples from actual symbolic processing situations. 


The central issue of exploiting this parallelism is the dynamic containment of the resources 
necessary to execute large speculative codes. Soley shows efficient structures (graph schemata 
and architectural support) for executing highly speculative programs (such as expert sys 
tems) under a dataflow execution paradigm. In order to control dynamic execution graph 
growth, Soley develops controls over cross- procedure parallelism in an extensible manner, 
with applications to the various current problems of dataflow computation. Approaches to 


scheduling, prioritization, and search tree pruning were considered, evaluated, and compared. 


In his thesis, Soley’s work fleshes out the details of primitive execution resource manage 
ment (function application and memory allocation), giving implementations for general and 
primitive resource managers and other nondeterministic constructs at the Id language level. 
Dynamic binding of managers is also presented to give a meaning to the term Stask;” Soley’s 


work supports the prioritization and termination of dynamically defined tasks. 


The underlying constructs used by Soley’s speculation control features rely on an extended 
definition of T structure storage. This new definition adds an uncontrolled structure WRITE 
(as opposed to STORE) instruction, which overwrites [structure cell contents. This nondeter 


ministie feature is useful for implementing higher level control constructs, as Soley shows. 


More revolutionary, however, is the new cell locking paradigm developed by Soley, Steele, 
Y, I I ) ) 
and Barth. The new sehome is detailed in Soley’s Ph.D. thesis, Steeles upcoming Master’ 


thesis, and Barth and Nikhil s report (341 The new locking structure of I-strueture memeri 
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already implemented in the Gita simulator (by Soley and Barth) and the Monsoon prototype 
(by Steele), relies on the existence of structure presence bits and deferral lists to allow critical 
section coding of resource managers and the like. In addition to supporting busy-waiting-free 
lock primitives, these “dataphores” also allow the storage of data in the semaphore cell itself 
(hence the new name). The basic contract of the locking instructions! are the following: 


e READ-AND-LOCK (cell): returns only when the cell has been locked, with the value 
written to the cell when it was allocated or last unlocked. 


e WRITE-AND-UNLOCK (cell, value): unlocks the cell specified, writing the given value 


into the cell. 


Recognizing that these instructions also support a primitive queueing mechanism (albeit of 
nondeterministic queue order), several other uses for this new feature have been found. MIT 
is pursuing a patent on this extension to dataflow (and other message passing) architectures. 


5.6.7 Garbage Collection for Id on Monsoon 


Arun Iyengar has begun looking at garbage collection on dataflow multiprocessors. We are 
implementing a copying garbage co. or for Monsoon. Simultaneously, Young is looking 
at compile-time techniques for detecting when heap objects are no longer needed. We plan 
to quantitatively study the amount of storage which can be reclaimed by garbage collection 
and static program analysis. We are also interested in the increased execution time and 
additional support required by these two different approaches for reclaiming heap storage. 


5.6.8 Parallel I/O 


Bhaskar Guha Roy worked on the design of a parallel I/O system for a dataflow machine. 
In addition to processing elements and I-structure memories, he proposed that disk units be 
attached to the interconnection network. Processors would interact with the disk units using 
split-phase transaction in a manner similar to I-structures. To initiate a disk transfer, the 
processor sends a token to a disk unit, specifying the direction of transfer (read/write), the 
address of the disk block, the address of an I-struciure for the data, and the continuation of a 
thread that awaits the completion of the transfer. The objective is to tolerate disk latencies 
in exactly the same way that the latency of I-structure accesses is currently tolerated by the 
processor. Guha Roy designed language constructs to express parallel I/O in the presence 
of nonstrict data structures, and designed compilation techniques for them. 


'Variously called iock/unlock, read-and-lock/write-and-unlock and take/put; here we shall use the most 
verbose forms. 
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5.6.9 Other Monsocn-related Work. 


Ken Steele and Richard Soley proposed a design for integrating virtual memory address 
translation into the dataflow model [294]. 


Lina Muryanto and Peter Tan wrote a compiler that takes ETS code from the Id compiler 
and produces MC68020 code, so that Id programs may be run on Sun workstations. It uses a 
MIPS-like RISC language as an intermediate form, to facilitate porting it to other machines. 
So far, the compiler only accepts a small subset of the full language, and much work remains 
to be done in optimization. 


5.7 Applications 


We are happy to report an increase in the number of large application programs being written 
in Id. 


5.7.1 Simulated Annealing 


Stephen Brobst and Phil Kuhn implemented a number of different algorithms for simulated 
annvaling. Simulated annealing is a heuristic that is commonly applied to a large class of 
opt mization problems that are known to be NP-complete, such as scheduling and build- 
ing layout. They found that although the purely functional subset of Id did not lend itself 
we’: to an efficient implementation, accumulators provided an elegant paradigm for handling 
the nondeterministic aspects of the algorithm without sacrificing overall determinacy in the 
prcvram. They also made use of Barth’s lock and unlock primitives along with structure 
ovcrwrites to implement a purely nondeterministic, nonfunctional version of the program. 
Th- ability to overwrite structure elements without copying the full structure provided a 
lar.se reduction in the number of instructions during program execution. However, the syn- 
chionization required for correct implementation of the algorithm in the presence of structure 
ov twrites, actually increased the critical path length of the program. Moreover, debugging 
an . program design in the presence of locking and structure overwrite primitives became sub- 
st: atially more difficult. Issues of deadlock, nondeterminism, read-write races, etc. which 
we'e previously not present in the deterministic implementations became major stumbling 
bk ¿ks in the parallel execution environment. 


5.7.2 DNA Sequence A'gorithms 


DNA sequence data is accumulating very rapidly. If the genetic sequence of the entire human 
genome is determined, databases will grow by two to three orders of magnitude from their 
current sizes. Parallel processing is becoming increasingly important as biological sequence 
data increases. Arun lyengar implemented several different algorithms for comparing se. 
quences using Id. Implicit parallelism makes Id a very easy language to use. One drawback 
is the extra copying required when an aggregate data structure needs tu be updaied. 
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5.7.3 Flight Path Generation 


For his Ph.D. thesis, Michel Sadoune of the Department of Aeronautics and Astronautics 
has implemented a Terminal Area Trajectory Planning System for air traffic control. 


A Flight Path Generator is defined as the module of an automated air traffic control system 
which plans aircraft trajectories in the terminal area with respect to operational constraints. 
The flight path plans have to be feasible and must not violate separation criteria. 


The problem of terminal area trajectory planning is structured by putting the emphasis on 
knowledge representation and air-space organization. A well defined and expressive semantics 
relying on the use of flexible patterns is designed to represent aircraft motion and flight 
paths. These patterns are defined so as to minimize the need for replanning and to smoothly 
accommodate operational deviations. 


Flight paths are specified by an accumulation of constraints. A parallel, asynchronous im- 
plementation of a computational model, based on the propagation of constraints, provides 
mechanisms to efficiently build feasible flight path plans. A network of constraints is imple- 
mented as the superposition of dataflow graphs which are synchronized distributively. 


A methodology for a fast and robust conflict detection between flight path plans is intro- 
duced. It is based on a cascaded filtering of the stream of feasible flight paths and combines 
the benefits of a symbolic representation and of numerical computation with a high degree 
of parallelism. 


The Flight Path Generator is designed with the goal of implementing a portable and evolving 
tool which could be inserted in controllers’ routine with minimum disruption of present 
procedures. 


Flight path generation and conflict detection have been impiemented in Id. The program 
which is run with various machine configuration is composed of 600 procedures for a size of 
5000 lines of Id code. It is used as a test program for the Monsoon compiler. 


The conflict-free feasible flight paths which are generated and tested in an Id envircnment 
can be translated into Lisp data structures by using an interface between Id and Common 
Lisp. They are then displayed on the screen and simulated in an interactive manner. 


5.7.4 DARPA Image Understanding Benchmark 


Arthur Altman, visiting from Texas Instruments, began implementing the DARPA Im:ge 
Understanding benchmark as an Id application. This benchmark performs model-based 
recognition of a2 1/2 D “mobile” of rectangles from two 512 X 512 pixel images, one 
containing intensity data (8-bit integers), the other depth data (32-bit IEEE floating point). 
As such, it performs extensive numeric (data-directed) and symbolic (knowledge-directed) 
processing. Once the benchmark has been converted to Id, he will evaluate its potential 
parallelism and related performance parameters on the simulated TTDA target machine 
provided by the Gita environment. 
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5.8 P-RISC 


Our work has continued to bridge the once-wide gap between von Neumann and dataflow 
architectures. In 1988, Nikhil and Arvind proposed a new processor architecture called P- 
RISC (for Parallel RISC) that properly extends a conventional RISC processor in such a way 
as to make it more suitable as a component for a parallel machine. The architecture will 
be presented at the 1989 International Symposium on Computer Architecture in Jerusalem 
1243]. 


We first organize the machine so that instruction and frame memory are local to a processor, 
while heap memory is global. Next, we identify a frame (activation record) as the register 
set for a thread. The control state of a thread can now be described succinctly as a token 
containing an instruction pointer and a frame pointer. 


We now reorganize the processe: so that it is multithreaded. The first step is to introduce a 
token queue that can contain multiple tokens. On each clock a token is dequeued and sent 
through the processor pipeline. The instruction it points to is fetched and executed relative 
to the frame that it points to. Finally, a new token is produced that is reinserted into the 
token queue. Note that successive tokens can be from unrelated threads. 


To deal with long ınemory latencies, we use the technique of I-structures. A load instruction 
sends a request to memory along with a return continuation. Meanwhile, the processor 
is free to execute other tokens. The response from memory comes back with the return 
continuation-—the value is stored in the frame and the continuation is requeued. 


For fine-grained parallel operation, we extend the instruction set with three new instructions: 


è fork, which is like a jump, except that it also p-oduces the token for the next instruction 
(1.e., it is like a jump and continue). 


e join, which specifies a frame offset containing a counter initialized to n, the number 
of threads that will execute this instruction. Each execution decrements the counter. 
Only the thread that decrements it to 0 continues— the other threads are discarded. 


è start, which specifies a continuation in a different frame (which may be on a different 
processor), along with a value to be stored in that frame before the continuation is 
started. 


With these instructions, it is possible to emulate the fine-grained parallelism of a dataflow 
graph. Being a superset of a conventional RISC instruction set, it is also possible to execute 
conventional compiled code, e.g., from FORTRAN. 


We have begun simulation and other studies to evaluate this architecture, described below. 
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5.8.1 Compiling for P-RISC 


Bradley Kuszmaul specified an abstract P-RISC instruction set, complete with operational 
semantics (specified by a relation on machine states). 


He is implementing a P-RISC code generator for the Id compiler. Code generation proceeds 
by transforming a data flow graph into a control flow graph, performing certain optimiza- 
tions, and then transforming the control flow graph into machine specific code. Examples of 
machine specific code which might be generated include: 


e the abstract P-RISC instruction set mentioned above; 


e other specific P-RISC instruction sets (such as the P-RISC co-processor for a RISC 
chip being worked on by Sharma, see below); 


e a variant of Monsoon with registers; 
o Eps'88; 


e a standard serial machine (such as a RISC computer, a Vax, a Lisp machine, or a Cray 
supercomputer); or 


e off-the-shelf parallel MIMD hardware. 


It appears that the control flow graph intermediate format is well suited for the target ar- 
chitectures mentioned above. Currently only parts of the Id language are correctly compiled 
to control flow graphs, and the only machine specific code generated by the compiler are 
the abstract P-RISC instruction set and the serial code for the Lisp Machine. Preliminary 
results indicate that it may be possible to run Id programs almost as fast, i.e., within a factor 
of four to ten, as Lisp or C programs. 


5.8.2 Simulator for P-RISC 


Ira Scharf, as part of his S.B. thesis, has been working on an interpreter for the abstract 
P RISC instruction set developed by Kuszmaul. The objective is to build a tool like Gita, 
our graph interpreter for the TTDA, that has proved so invaluable in evaluating the TTDA. 
A first version of the interpreter is now running. 


5.8.3 Implementatior of P-RISC Using Ordinary RISC Processors 


Prelitninary to the P-RISC work, Kuszmaul and Sharma surveyed commercial RISC chips, 
with an eye towards P-RISC implementation. We then did some design and back-of-the- 
envelope analysis of various strategies for implementing P-RISC on commercial RISC hard- 
ware (possibly using some sort of co-processor to provide a hardware assist for the P-RISC 
specific operations). Those (very preliminary) results indicate that an unmodified commer- 
cial RISC computer might lose only a factor of 10 to 15 over a dedicated P-RISC processor. 
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By adding an I-structure memory to the RISC computer, the performance degradation com- 
pared to a P-RISC processor drops down to four or five (Steele has spent some effort at 
thinking about how to make I-structure memory work for a RISC processor). By adding 
hardware assist for context switching, that degradation goes down to two or three. 


Sharma has been trying to identify a way of efficiently caching activation frames of tasks 
in the processor register set so as to minimize the penalty incurred on switching from one 
thread to another. We developed a write-through caching scheme that caches activation 
frames in a set of register windows. We have also proposed a scheme which allows a very 
high degree of look-ahead in the instruction stream. In other words, a processor can easily 
identify the next 15-20 instructions to be executed. We accomplish this by switching threads 
even on conditional branch instructions—which are nondeterministic in the sense that the 
flow of control beyond such instructions is not known until after the instruction is executed. 
Putting these two schemes together, we get an architecture which permits switching between 
threads with minimal (potentially zero) penalty. Further, the high degree of look-ahead in 
the instruction stream may offer several advantages that have alluded processor-pipeline 
designers in the past. We are currently examining these. 


5.9 Functional Databases 


Michael Heytens continued his investigation into the synthesis of databases and functional 
languages, treating an update transaction as a declarative specification of a new version of 
the database, inspired by the treatment of I-structures in Id. After completing the design of 
a kernel database language to express such updates, he has begun implementing a prototype, 
based on ideas from compiling Id to P-RISC machines. 
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6.1 Introduction 


In the last year we have worked hard at preparing a cogent proposal for the final design 
and the construction of CAM-8, a high performance cellular automata multiprocessor. The 
proposal has finally been funded by DARPA, and we are extremely busy now working to 
deliver the goods, 


Some of the more theoretical research that we managed to carry out while we were waiting 
for an answer to our proposal, and that is described in this report, will continue at a slower 
pace for a while, until the design of the VLSI chip that constitutes CAM-8’s “heart” is shipped 
to the foundry. 


Our activity has concentrated on the following areas, discussed in more detail below: 


e Relativistic invariance in parallel computations. 
e Solid-body motion in cellular automata. 
e What variational principles may look like in discrete systems. 


e Further design of CAM-8—a large cellular automata machine for (mainly) physics em- 
ulation. 


è Symmetric and asymmetric interface formation in conservative interactive-particle sys- 
tems. 


è Pattern recognition by texture-locked loop. 


e Identification and experimental determination of specific ergodicity in invertible dy- 
namical systems. 


6.2 Relativistic Invariance in Parallel Computations 


We have continued studying the problem of how relativistic invariance may emerge at the 
macroscopic levels in cellular automata and similar discrete systems, in which such invariance 
is meaningless at the microscopic level (see previous progress report). 


A preliminary report on last year’s work has appeared in [299]. Work is in progress to 
generalize those results to more than one dimension [292]. 


An alternate way of probing the relationship between Lorentz invariance and cellular au- 
tomata is through the study of wave equation models in one or more dimensions. Note that 
while conventional computational models use discrete space but continuous state variables 
at each site, here we are trying to get the same behavior with binary state-variables. 
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The one dimensional version is especially tractable. Not only do we have cellular automata 
rules that satisfy the wave equation exactly; we can also evaluate in closed form, via com- 
binatorial arguments, phase-space averages over the entire set of states of, say, a lattice 
string, as well as dynamical and statistical properties involving kinetic and potential ener- 
gies, mean-square sums of the amplitudes of the normal modes, and the moments of the 
Fourier components of the system. 


Because these implementations use particle-like entities to simulate wave phenoniena, they 
also naturally lend themselves to the study of quantum mechanical phenomena, especially 
vis-à-vis Feynman path-integral methods. Generalizing our results for the n-dimensional 
wave equation to the corresponding Dirac and Weyl equations, we have obtained a novel 
lattice method for simulating single-particle quantum phenomenawave, in the spirit of the 
Bohm interpretation. 


6.3 Solid-body Motion in Cellular Automata 


This new area of work is related to that of the preceding section. 


There has been much excitement over recent work on fluid modeling with cellular automata. 
These models have been basically point models: all properties of the fluids have been rep- 
resented by the contents of individual cells. As one simulates a larger and larger range of 
material parameters, this approach requires more and more bits in each cell, with an ex- 
ponential increase in the size of the lookup table (current simulations of 24-particle-per-site 
lattice gases, done on a CRAY-X/MP employ about one gigabit of fast memory for this purpose 
[121]!). 


Viewing cellular automata as a model of fine-grained parallel computation, there are tremen- 
dous technological advantages in finding and using simple CA rules. 


In nature, complex materials are made up of simple parts (atoms) held together in groups by 
various adhesive and cohesive forces. We would like to find cellular automata models with 
analogous characteristics: information about complex material properties should be spread 
out among a collection of cells. This involves the problem of simulating forces between par- 
ticles in an appropriate manner (with momentum and energy conservation, and preserving 
reversthility) so that we can have collections of particles (bodies) moving together and in- 
teracting with one another. Since this approach provides an alternative to using extremely 
coniples rules, we consider the problem of not knowing how to make moving bodies (and 
forces) in cellular antomata to be a major obstacle standing in the way of extensive use of 
cellular automata tor physical modeling. 


This problem is closely related to the relativity discussion of the previous section: in a 
relativistically-invariant system, we have the same physics in any inertial frame. We can 
therefore nave bulk motion of macroscopic (and microscopic) bodies, while their internal 
dynamics axd chemistry remains essentially unchanged. Thus if we can have (collective! 
Sodies at all in a relativistical'v invariant system, we automatically have moving bodies 
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The extent to which the cnposite implication holds (moving bodies imply relativity) is an 
interesting question, which this research will also address. 


Understanding the physical aspects of the discrete entities used in our models in order to 
account for deformations of solid bodies and the attendant restoring forces will lead to new 
intrinsic measures of physical interest, such as discrete stress tensors. 


Among the many preliminary explorations we have made in this area, we have studied in 
some detail a simple cellular automaton model of string-like objects freely moving in space 
(in one, two, or three dimensions) and interacting with one another [78]. The basic object 
can be thought of as a chain of point-masses connected by springs; each point can have 
adjustable mass and momentum, and each link adjustable potential energy. Longitudinal and 
‘ransverse vibrations are supported, as well as average bulk motion and collisions between 
objects, with strict conservation of the above quantities. Some applications of this model 
are under investigation [77). 


6.4 What Variational Principles May Look Like in Discrete Sys- 
tems 


The variational principles of mechanics characterize the solutions of certain differential equa- 
tions as continuous functions for infinitesimally small variations of which the value of certain 
continuous functionals remains constant. 


In many “granular” dynamical systems, such as cellular automata, both the independent 
variables (space and time) and the dependent ones (state variables) are discrete, and thus 
do not admit of infinitesimally small variations. Clearly, variational principles in their tra- 
ditional form cannot be employed here. On the other hand, the fundamental role played by 
variational principles in mathematical physics makes one suspect that something having the 
same flavor should be available in the analysis of discrete systems. 


Indeed, as soon as one studies these systems from a macroscopic, combinatorial viewpoint, 
variational principles emerge with such regularity and strength as to make one believe that 
the whole approach should be reversed. Instead of taking continuous variational principles 
as the paragon, and looking for some imitation of them in discrete systems, one may take 
as a productive working hypothesis that the prototypical variational principles arise from 
combinatorics. Variational principles appear with such regularity in physics not because 
they represent some deep-seated feature of physics proper, but because they are a corollary of 
general combinatorial laws and are bound to arise whenever one considers systems consisting 


of a large number of elements. 


We have investigated the above topics in a variety of settings, having in mind (a) the emer- 
gence of the concept of energy out of microscopic combinatorics, and (b) the connection 
between conserved quantities and symmetries. In particular, we have discovered cellular 
antomata models displaying exact harmonic motion not only for infinitesimal perturbations, 
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but also for arbitrarily large displacements. We have started studying the equilibrium con- 
figurations of such systems in various dissipative regimes, establishing a connection between 
the variational principles that govern such configurations at a macroscopic level and the 
combinatorics that governs them at the macroscopic level. 


6.5 CAM-8—A Large Cellular Automata Machine Suited for 
Physics Emulation 


We have completed the basic design of CAM-8, a large, high performance cellular automata 
machine which, for its intended areas of application, will be by far the fastest computer in 
the world. Indeed, this machine will constitute a “microscope” into “computational worlds” 
that were until now inaccessible (see previous progress report for more details), and thus 
will stimulate real, practical use of cellular automata as a modeling environment. Partial 
funding for the actual development of this machine has been granted by DARPA, starting 
January 1989. More conceptual aspects of this architecture, in particular in the context of 
simulation of stylized physical svstems, fall within the scope of our NSF contract. 


CAM-8 is the next generation in a line of Cellular Automata Machines (CAMs) developed 
at the MIT Laboratory for Computer Science, and are already used by many investigators. 
The essential elements of the CAM-8 architecture and some of its intended applications are 
reported in [232](298]. (To make the conceptual aspects and the potential applications of 
such machines accessible to a wide andience, we have written a book, Cellular Auto.nata 
Machines—A New Environment for Modeling, which constitutes a comprehensive introduc- 
tion to the subject and illustrates the use of an earlier machine, CAM-6, which is in commercial 
production. We have just completed the second edition of software and documentation for 
CAM-6 [71].) 


The functional architecture of CAM-8 is fundamentally that of a cellular automaton—where 
a large number of identical atomic processors are uniformly interconnected to form an indef- 
initely extended two or three dimensional network (“polynomial interconnection” architec- 
ture) and operated in synchronism. This approach gives CAM-8 unmatched performance in 
dealing with discrete, fine-grained models of systems whose topology reflects that of ordinary 
spacetime. 


Our specific implementation of the basic cellular automaton plan includes certain refinements 
recommended by recent theoretical developments, and makes use of a number of original 
solutions suggested by the current technological context. Some of these features allow CAM- 
8 to retain a high level of performance even in certain areas where one might expect that an 
“exponential interconnection” architecture (e.g., tree or hypercube) would be mandatory. 


For many applications, this machine may be visualized as a volume of simulated pro- 
grammable matter in which a large variety of experiments on spatially-extended physical 
systems can be performed rapidly and conveniently—a useful metaphor for this is a “silicon 
wind-tunnel.” Other examples include the simulation of physical phenomena such as diffu 
sion, aggregation, and phase separation; the study of properties of materials such as plasma 
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and alloys, and of chemical reactions; the exploration of certain models of fundamental 
physics; and a number of practical applications such as the study of how waves propagate 
in a nonhomogeneous medium and are reflected by arbitrarily shaped obstacles (e.g., radar 
and sonar echo analysis). 


In addition, CAM-8 will constitute a powerful computer for many information processing 
applications dealing with fine-grained structures having a high degree of regularity in at 
least two dimension: --for instance, a “si'icon retina” with real time performance Indeed, 
this appears to be an ideal architecture for many pattern recognition and tracking tasks 


1300). 


Further, CAM-8 will provide a natural environment for the simulation of large scale logic 
circuits; in particular, for exploring the potential of icconfigurable circuits (“downloadable 
hardware”). 


Finally, a machine of the functionality of CAM-8 will be indispensable for designing and 
emulating the aigorithins that will constitute the firmware of a new generation of fully 
parallel cellular automaton ultracomputers. 


6.6 Symmetric and Asymmetric Interface Formation in Conser- 
vative Interacting-particle Systems 


We have continued on interface formation in immiscible fluids, simulated on the basis of 
microscopic first principles. In brief, we study the approach to equilibrium of a system 
consisting of a large number of particles of two kinds, coupled by local interactions, under 
the constraints of strict invertibility and conservation of particle species and total energy. 


To this end, we have equipped one CAM-6 unit with extended processing tables, capable of 
handling the more complex local interactions required by these models. Besides refining and 
extending work done in this field by others on symmetric interface formation—which entails 
only binary interaction-—-we have started exploring the more challenging area of multiplet 
interactions, which has allowed us to model, among other things, asymmetric surface tension 
effects. 


The necessary energy bias in the rule due to curvature was recognized to be an approxi- 
mation to a new distributed quantity that we have called winding number density, and the 
ramifications of this quantity are being investigated. These studies have also yieldea bene- 
fits by suggesting the development of graph related algorithms and topological concepts for 
cellular automata '291). 


6.7 Pattern Recognition and Tracking by Texture Locked Loops 


We have continued working on a pattern recognition method that is applicable to a limited 
but pervasive class of patterns typically, natural landscape features such as rivers, coast- 
lines, urban agglomerations, etc., as they may appear on a satellite photograph; fingerprints 
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and other biological constructs; and, in general, textures and structures whose long range 
spatial correlations are ultimately explainable in terms of the repeated action of simple local 
mechanisms [300]. This method, which can be thought of as a generalization of the well- 
known phase-locked loop, is insensitive to large amounts of noise; it can take good advantage 
of the peculiar tradeoffs in computational resources offered by fine-grained parallel processors 
(such as Cellular Automata Machines, the Connection Machine, and the Massively Paral- 
lel Processor); finally, by its very nature, this method is to a certain extent “aware” of its 
capabilities and its limitations. 


This approach to pattern recognition has been presented to several technical audiences (GE 
Research Labs, Naval Laboratory, and Lincoln Labs) and has been well received. We are 
now exploring specific applications. 


6.8 Specific Ergodicity 


We have started working on a new theme, mainly the identification of a new quantity of inter- 
est in dynamical systems, and its experimental determination in a number of representative 
cases. 


Specific ergodicity asks, for an invertible cellular automaton, what fraction of the total in- 
formation needed to identify an individual state is devoted to specifying the position of this 
state on its orbit. We give empirical evidence that this question has a definite answer. A 
preliminary report on this work appeared in [299]. 


The experiments reported in the above reference were performed using a few AT clones full 
time for several weeks, which is equivalent to several hours on a typical supercomputer. In 
view of the exponential complexity of the problem, even moderate improvements on the 
numerical estimates obtained so far would require a drastic increase in computing power. 
For simple cellular automata, a speedup of = 10,000 can be achieved with a dedicated, fully 
parallel implementation consisting of a few programmable gate-array chins. We have set up 
fast dedicated simulators of this kind for some simple two dimensional cellular automata, 
using the largest available XILINX chips, and we are beginning to collect experimental data. 
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7.1 Introduction 


Mercury is a communications mechanism that supports efficient communication among pro- 
gram modules in a distributed, heterogeneous environment [212][213]. Modules act as clients 
and servers: a server is a module that provides a number of procedures that can be used by 
other modules, called clients, to interact with it. Communication occurs by means of call 
streams; clients make calls to the procedures provided by servers over these streams. A client 
is able to make three kinds of calls: synchronous calls, in which the client waits until the 
call returns before making a subsequent call on that stream; asynchronous calls, in which 
the client can make a number of calls on the stream without waiting and pick up the results 
of the calls tater; and sends, which are like asynchronous calls except that the client picks 
up results only if a call terminates in an exceptional condition. 


During the current year, we have continued to work on the design and implementation of 
the Mercury communication mechanism. In addition, we have developed a new protocol for 
implementing at-most-once messages efficiently, and have started work on a new project to 
provide an object repository for use in a heterogeneous network. 


7.2 A Formal Specification for Mercury Call Streams 


B. Liskov and L. Shrira have provided a formal specification for Mercury call streams. An 
important goal in specifying Mercury streams is to allow the different language veneers to 
present streams differently to their users. The intention of the specification is then twofold: 
first, to guarantee that the different veneers “understand” each other; and second, to limit 
the information exposed by the veneer operations. The specification deals with the safety 
properties of the protocol; it does not address the performance aspects of call streams. 


The specification uses an event-based model. It defines the events that can be observed by 
users of streams and restricts the legal sequences of those events. The specification allows 
differences in veneers by defiring the common primitive events that underlie the veneer 
operations. In other words, the specification does not define a set of stream operations that 
all veneers must provide. Instead, veneers are free to define a convenient set of operations. 
However, the meaning of any stream operation provided by a veneer must be defined in 
terms of a legal sequence of the defined events that represents the effect of that operation 
on the stream. For exampie, in one veneer, user programs at servers might explicitly wait 
for the next call to arrive; while in another, a process might be created automatically by 
the veneer when a call arrives without the user code having to wait. The executions of the 
stream operations in both veneers must be explained using event sequences permitted by 
our specification. 


The operations in different veneers need not expose all details of streams. The specification 
defines the most that can be observed by user code; veneers are always free to hide detail. 
For example, user code at the receiver may not be able to observe that a stream is broken 
{i.e., unable to transmit messages) even though our events convey this information. 
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7.3 Mercury/Argus Veneer 


T. Bloom and D. Curtis have been working on the implementation of the Argus veneer. 
Stream calls have been added to the client-side veneer. The promises mechanism [213] is 
implemented with a procedural interface rather than syntactic support. The Mercury catalog 
implementation has been enhanced to provide both service and port registration, and the 
restart/recovery mechanism is fully implemented. At this point the Argus veneer is fully 
functional, with the exception of support for a few additional Mercury types (vspaces) to be 
incorporated. Work has started on designing a test suite for use with all the veneers and on 
performance analysis. D. Curtis and her students have implemented a calendar application 
and a distributed login-uid as Mercury services are built in Argus. 


74 Transactions in Mercury 


B. Liskov and W. Weihl have developed a design for the protocols to be used in messages 
concerning transactions in Mercury. Mercury transactions will be compatible with those in 
Argus so that Argus servers can be accessed under Mercury. However, some changes from 
the Argus protocols are needed because the constraints in Mercury are somewhat different 
than Argus. For example, in Argus every remote call must be made as a subaction; Mercury 
does not require this. Making a call as a subaction insulates the caller from failures that 
occur in the call: if the call does not complete, or it aborts at the callee, the calling action 
need not abort. If there is no subaction, these circumstances will force the calling action to 
abort, and the messages exchanged in this case must indicate this fact. 


B. Liskov and W. Weihl also defined a new entity called a transaction management server. 
Such servers will cun at many Mercury nodes and can be used across the net via Mercury 
suieams. A transaction management server performs various housekeeping chores associated 
with transactions on behalf of clients. It is advantageous for two reasons: 


1. It reduces the work needed to implement transactions in C and Lisp. Instead these 
veneers can call on the server to do much of the work in implementing transactions. 


2. It can reduce the probability of a failure occurring in the middle of two-phase commit. 
This is possible because the servers can be located at more reliable nodes, and will not 
be directly under the control of users who might, for example, turn off the machine 
they are using. 


The servers themselves are easy to implement, since they are simply specialized Argus 
guardians. Also, the server uesign does not require extra communication that would de- 
lay the execution of user transactions. For example, it is not necessary to communicate 
with the server to create a transaction or to make a call. Instead, the server is used only 
at two-phase commit and delays the commit of the transaction as seen by the user by one 
message delay. 


Mercury 


7.5 Efficient At-most-once Messages Based on Synchronized Clocks 


B. Liskov, L. Shrira and J. Wroclawski [214] have designed a new efficient message passing 
protocol that guarantees at-most-once message delivery without requiring communication to 
establish connections. The goal is to be able to accept messages most of the time even when 
the receiving module has no state information stored about the sending module. The scheme 
is interesting because it allows us to efficiently implement at-most-once remote procedure 
calls (RPCs), even when there are large numbers of clients and servers and when clients 
communicate with servers only occasionally. 


At-most-once semantics for RPCs means that a call is guaranteed to be executed at most once 
even when failures occur such as a crash of the receiving module. It is desirable because it 
provides proper semantics even when calls are not idempotent. However, the implementation 
of at-most-once semantics can be expensive because the server needs a way of determining 
whether it has seen a message before. The determination can be made if the server maintains 
some state, known as a connection, for the client. If there is no state, the connection must be 
established, which typically requires a pair of messages to be exchanged between the client 
and the server. If the connection is used for many calls, the cost of the connection setup can 
be amortized across all of them. If there are only a few calls, the overhead is high relative to 
useful work. In the worst case, only one call will be made on the connection, and the cost 
of the call is doubled. Yet this case may be quite common; it corresponds to clients using 
servers only occasionally. 


To avoid the cost in this common case, systems have provided at-least-once semantics, which 
provides only weak guarantees about how many times a call is executed. For example, 
even when a call terminates normally, it may have been executed more than once. Some 
systems provide at-least-once semantics as the only option; others provide it as an alternative 
available to the client if desired. Both approaches are undesirable: with only at-least-once 
available, the application programmer must cope explicitly with the problems arising from 
non-idempotent calls. Things are better when both are available, but the communication 
system is more complicated than if there is just one choice. 


Our work shows that it is practical and efficient to provide only at-most-once semantics. The 
method allows calls to be made without prior communication to establish a connection. Ours 
is not the first method to do this; the Delta-t protocol [307] also avoids connection setup. 
However, we use a different technique based on loosely synchronized, monotonic clocks. Our 
»rotocol can easily tolerate the «lock skews provided by existing clock synchronization proto- 
cols; these skews are typically less than 100 milliseconds. If the rare ever. of unsynchronized 
clocks does occur, the protocol continues to work correctly although there is a degradation of 
performance. The protocol requires that clocks at servers that survive crashes be monotonic; 
it does not rely on properties of clients’ clocks for correctness. 


We used the message protocol to implement at-most-once RPCs based on the SunRPC library 
and compared our performance with at-least-once and at-most-once RPCs already available 
in the SunRPC library. Our performance measurements indicate that at-most-once RPCs 
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can be provided at the same cost as less desirable ones that do not guarantee at-most-once 
execution. 


7.6 Object Repository 


B. Liskov and L. Shrira and a group of students have been working on the design of an 
Object Repository. Programs of today make use of file systems to store data that must 
survive from one day to the next. Programs of the future will use object repositories instead. 
Object repositories are better suited'to the needs of programs and users because they correct 
several deficiencies of file systems, as discussed below. 


Our repository will provide the following features: it will be language independent, store 
typed objects, support atomic transactions, be both highly reliable and highly available, 
and will control access to its objects. The repository fits in well with the work on Mercury, 
which provides a method for clients to use repository through its call streams and provides 
a language-independent type system that can be used in the repository. It will also be a 
useful service to be made available through Mercury. 


Some aspects of the repository are: 


Objects vs. Files: An object repository stores objects instead of files. Objects differ from 
files in three important ways. First, they are often small. File systems tend to be biased 
toward large objects, so that users must combine small objects together into large ones to 
use the system efficiently. By contrast, an object repository must be engineered to work 
efficiently for small objects as well. 


Secondly, an object repository knows about the types of objects stored in it. File systems 
do not have such information, so they provide no help for users to avoid type errors. An 
object repository does provide such help. Of course, the types in use in the repository must 
be independent of particular programming languages, since we want to allow many different 
languages to use, and communicate through, the repository. We already have such a type 
system, namely, the type system developed for communication in Mercury. Furthermore, the 
Mercury library can be relied upon to store information about types, and to give a system- 
wide meaning to types, thus permitting us to avoid type errors in the object repository. 


The type system for the object repository must include abstract data types because the 
repository will need to invoke operations of a type when processing queries. The operations 
must be defined by the person who defines the new type. 


The third point is that objects in object repository may refer to other objects in the repos- 
itory, thus representing the kinds of sharing structures that are often useful in programs. 
Ry contrast, file systems do not allow files to refer to other files in a way understood by 
the system. Having interconnected objects raises interesting questions related to naming 
the objects, reading complex objects, allocating and deallocating memory for objects, and 
garbage collection. 
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Transactions: Interactions with the object repository will occur within atomic transactions; 
we expect to use the Mercury mechanism here. However, an object repository is likely to 
use a transaction mechanism in a nonstandard way. For example, a program development 
support system might be implemented on top of an object repository. When a person is 
working on a new release of a particular module, he is likely to have that object locked for a 
long time. We would not want to provide such an ability by running the entire production 
of the new release as a transaction, since holding locks for a long time is not good for 
system performance. Instead we need to devise a different kind of interface, probably of the 
“check-out /check-in” variety. The design of such an interface is a challenging problem. 


Availability and Reliability: The repository must be both highly available, so that indi- 
vidual objects are very likely to be usable when needed, and highly reliable, so that infor- 
mation entrusted to it is not lost with high probability. The plan is to store objects in the 
repository at a small number of server nodes. To speed up interaction with the repository, 
clients will maintain caches containing recently used objects. To achieve high availability, we 
will need to use replication. Our new primary copy technique [246] should be good for the 
repository. Copies of each object will reside at several servers; the database as a whole will 
be partitioned so that the load at the servers that implement the repository will be balanced. 


If the servers that store copies of an object are sufficiently failure independent, then a highly 
available system is also highly reliable. We may choose to achieve failure independence by 
equipping our servers with universal power supplies. In addition, we plan to investigate both 
archival mechanisms and backup mechanisms to be used when catastrophies occur. 


Access Control: Sharing is only useful if it can be controlled. For example, sensitive 
data will be stored in the repository only if reading can be controlled. Such control can be 
achieved by access control mechanisms based on the authentication methods being developed 
for Mercury. 


Language Interface: T. Bloom and S. Zdonik (of Brown University) have been working on 
issues of object-oriented database design. They have been looking at the problem of merging 
object-oriented languages and databases into seamless database programming languages with 
uniform access to all objects. This work is related to the way the object repository will 
interface to Mercury host languages. 
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8.1 Introduction 


The 1988-89 academic year marked the final year of operation of the Parallel Processing 
Group, due to the group leader’s departure from MIT. 


The group’s focus has been to learn how to build parallel processors that can be programmed 
for general purpose applications. The group’s efforts are based on the parallel Lisp language 
Multilisp (147}/149][148]. Members of the group have worked on several aspects of parallel 
processing: implementation of parallel Lisp systems, speculative computation, applications 
for parallel Lisp, parallel program debugging and tuning aids, and design of architectures 


well suited for parallel Lisp. 


A major milestone during the year has been the completion of the Mul-T high performance 
parallel Lisp system [189], which compiles code for the Encore Multimax multiprocessor and 
largely obsoletes the group’s earlier (interpreter-based) Multilisp implementation hosted on 
the Concert multiprocessor [147][149][152]. Being smaller and more malleable, the Concert 
implementation of Multilisp continues to be useful for quick experiments with modifica- 
tions of Multilisp, but the Mul-T system has performance that is better by two orders of 
magnitude. 


Other milestones include the successful demonstration of a Multilisp system that supports 
speculative computation and the completion of ParVis, a tool for debugging and tuning par- 
allel Lisp programs. Investigations of the parameters affecting performance of parallel Time 
Warp [172] simulations, and of naming problems in Lisp systems, were conducted. Finally, 
a preliminary version of MARCH, a processor architecture capable of efficiently executing 
parallel Lisp programs, was studied via simulation, and directions for future improvement 
were identified. A notable result of this study was the design of control logic for a coherent 
cache that can connect to a multithreaded processor and a split-transaction bus. 


The following sections describe each of the above mentioned aspects of the group’s activity 
in more detail. 


8.2 High Performance Parallel Lisp 


A major accomplishment for the year was the completion (by D. Kranz and E. Mohr) of 
the Mul-T high performance parallel Lisp system [189], which runs on an Encore Multimax 
multiprocessor. This effort began with the T system from Yale, which implements a dialect 
of the Scheme language [1][272]. Mul-T is a parallel version of T with a parallel garbage 
collector. Mul-T is most notable for its high performance: use of the T system’s ORBIT 
compiler [190!1188) leads to performance about 100 times faster than the group’s earlier 
Multilisp implementation on the Concert multiprocessor. 


Mul-T shows that Multilisp’s future construct can be implemented cheaply enough to be 
useful in a production-quality system. On the Boyer Lisp benchmark, for example, Mul-T 
was able to achteve higher performance even cn two processors than is achieved by running 
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the sequential Boyer program in the sequential T system (whose compiler is as good as or 
better than competitive commercial Lisp compilers) {189]. In addition to its performance 
advantages, Mul-T includes a set of debugging functions that extend T’s debugging features 
to handle a parallel execution environment. 


Innovations in Mul-T include the use of inlining as a run-time mechanism to increase task 
granularity and reduce task-management costs, and the concept of groups of tasks which 
provide convenient units to manipulate when debugging. Mul-T has been made available 
at no charge through network file transfer, or at a nominal charge from Enccre Computer 
Corporation. 


8.3 Speculative Computation 


Our investigation of the design of mechanisms to support speculative computation in Mul- 
tilisp continues. Speculative computation is eager evaluation where the result(s) of the 
evaluation may be unnecessary. It is a gamble whereby one trades additional, possibly un- 
necessary, computation for potentially faster execution. Speculative computation contrasts 
with mandatory computation, in which all computations are presumed to be necessary. Spec- 
ulative computation requires a means to control computation to favor the most promising 
computations, and the ability to abort computation and reclaim computation resources. 


Our interpreter-based implementation of Multilisp [147][149][148] has been extended (by 
R. Osborne) with constructs for speculative computation, and performance of several spec- 
ulative programs has been measured. These measurements demonstrate that performing 
computations in parallel before their results are known to be required can yield performance 
improvements over conventional approaches to parallel computing. On the Boyer theorem- 
proving benchmark and a traveling-salesman application, speculative computation yielded 
performance improvements of up to a factor of 2 over the best program using only mandatory 
constructs, while at the same time eliminating the tuning of parameters needed to achieve 
that performance in the mandatory arena. On a heuristic program to solve the 8-puzzle 
[249], a performance increase of a factor of 26 was measured. 


The main conceptual contribution of this work is a sponsor model that provides a framework 
for management of speculative computation. This sponsor model handles control and recla- 
mation of computation in a single, elegant framework. A sponsor is an agent that controls 
the allocation of resources to computation. A sponsor supplies attributes (such as a priority, 
or a claim on a certain quantity of compute time) to computations that are sponsored by 
it. Every running task is sponsored by one or more sponsors, and every sponsor can sponsor 
several tasks, as well as sponsor other sponsors. Thus, sponsors can be organized into hierar- 
chical networks that mirror the structure of the computation, and sponsors are a modularity 
construct that provide a “handle” on a subcomputation that can be used without explicit 
teference to the set of tasks currently performing that subcomputation. The contribution 
of the current work is the development of the basic sponsor model (inspired by the work of 
W. Kornfeld and C. Hewitt [186]) into a concrete mechanism, illustration of how it can be 
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implemented with acceptable efficiency, and demonstration of its application to speculative 
computation scenarios including side effects and complex inter-task dependencies. 


8.4 Naming in Lisp Systems 


A separate exploration (by J. Loaiza) focused on the problem of defining and creating mod- 
ules in Lisp. Three general styles of specifying and creating modules were identified. The 
three styles vary in ease of use, simplicity, and modularity. A module-system design was 
developed that treats a module as an independent object that requires a set of input values 
and produces a set of output values. The code that interconnects modules is kept separate 
from the code that implements modules. Methods of making dynamic modifications to a 
system of modules were also explored. 


8.5 Analysis of Scheduling in Parallel Simulations 


Analysis of a message- based system for concurrent simulation in Multilisp, based on the Time 
Warp system of D. Jefferson [172], was completed (by M. Ma). Time Warp is based on the 
paradigm of tasks exchanging messages. Each message has a “virtual,” or simulated time; 
messages received by a task are to be processed in order of increasing virtual time. Time 
Warp uses an optimistic concurrency control strategy in which processors eagerly process 
available messages even when it is possible that a given message will be processed before 
another message with a lower virtual time but a later real time of arrival at its destination 
task. When this occurs, the destination task must be backed up to an earlier state and 
re-executed from that point so that the required virtual time order is not violated. Backups 
clearly represent wasted work and should be minimized. The number of backups is thus an 
important performance measure, along with the total time taken to execute a simulation. 


Parameters that affect the performance of our simulation system were investigated using var- 
ious simulations, including digital circuits, a queuing system, and simulations of message flow 
in communication networks with grid and butterfly topologies. The parameters investigated 
include different choices of scheduling algorithm, based not only on whether the scheduling 
algorithm was static, dynamic or nondeterministic, but also on the method of grouping tasks 
into “partitions” (scheduling units). Certain methods of partitioning yielded much better 
results (processing time and number of backups) than others. A model describing the un- 
derlying pheuomena that govern the performance of Time Warp simulations was developed 
and evaluated. A key predictor of performance is the extent to which the scheduling method 
chosen succeeds in minimizing the variations in virtual times between partitions (variations 
within a partition were not so important). 


8.6 Parallel Program Development Aids 


A difficult and important problem in programming parallel processors lies in understanding 
the behavior of programs. Examples include determining where time is being spent in various 
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parts of a program, which parts are being executed in parallel, and where the bottlenecks 
are located. L. Bagnall completed development of a tool, ParVis (Parallel Visualization), 
for visualizing the execution of Multilisp programs. 


During a Multilisp program run, ParVis records the time of events representing task state 
transitions and intercommunication, such as task creation, blocking, resumption, and ter- 
mination. ParVis can then generate a graphical display of this information. The system 
provides a display with interactive features such as scrolling and zooming, which allow the 
user to examine various parts of the display. 


Because the program visualization utility is not interactive, but creates a display after a 
program run, ParVis communicates via data files rather than via an interactive network 
connection. Implementations of both Multilisp and Mul-T have been modified to generate 
the necessary trace data files. 


Because the additional information described by ParVis can lead to large, complex displays, 
a filter language is provided to allow the programmer to specify the displayed items that are 
of interest. Particularly notable is the interface between the filter language and the graphical 
display, which allows easy inclusion of display elements in filter definitions. 


ParVis has already been used extensively by several group members to find performance 
bottlenecks and analyze the effect of different scheduling disciplines (e.g., for speculative 
computation) on performance. 


8.7 Architecture for Parallel Processing 


An effort to learn how innovations in parallel architectures could improve the performance 
of parallel Lisp programs is being pursued (by R. Halstead, D. Nussbaum, H. Takagi, and 
I. Vuong-Adlerberg) through the development and evaluation of a processor architecture 
called MARCH (Multilisp ARCHitecture). This project began in the previous year with the 
specification of MASA [150], a processor architecture inspired by the HEP-1 [187] and SPUR 
architectures [162][297). MARCH differs from MASA in a number of details concerning 
procedure linkage, trap handling and task management, which have been revised in light of 
experience with writing run-time support routines for MASA. 


Like MASA, MARCH features several non-overlapping register sets that can be used in a 
manner like SPUR’s register windows to reduce memory accesses associated with procedure 
invocation. Alternatively, register sets can be allocated to concurrently executing tasks 
assigned to the same processor. MARCH thus aims to make procedure linkage efficient 
and make task creation equally efficient. MARCH’s pipeline, like that of the HEP-1, can be 
filled by issuing instructions from different tasks on consecutive clock cycles. The fast context 
switching implicit in the HEP-style instruction issue should also help bridge memory access 
latencies by allowing otter tasks’ instructions to be executed while awaiting completion of a 
memory operation. 
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Like SPUR, MARCH depends on software trap handlers to handle infrequent conditions and 
manage processor resources such as register sets. In particular, a scheduler must manage the 
mapping from the (unbounded) set of runnable tasks to the (finite) set of register frames, 
and a frame saver must be invoked whenever a request for a register frame is made and there 
are currently no free frames to satisfy it. 


During the year, implementation of a MARCH simulator was completed, the ORBIT com- 
piler from the Mul-T system was retargeted to generate code for MARCH, and basic versions 
of the frame saver, trap handlers, and scheduler were written. As a result, it became possible 
to run actual (but small) parallel Lisp application programs and measure the performance 
effects of different design decisions. The need for improved scheduling performance led to 
the definition (by H. Takagi) of an interprocessor interrupt mechanism for MARCH. Experi- 
ments (by H. Takagi) showed that grouping the scheduling and frame saving functions into a 
separate housekeeper task generally yields better performance than invoking those functions 
via traps. 


As a result of the year’s activities, considerable progress has been made in discovering how to 
achieve MARCH’s initial goal of efficient execution of parallel Lisp programs, but work is still 
needed to reduce several performance costs. Notably, MARCH’s HEP-like instruction-issue 
mechanism only issues an instruction when the previous instruction in the same instruction 
stream has completed—this requires a large number of streams to fully utilize a pipelined 
processor. Also, housekeeping costs are still too high. Although the Parallel Processing 
Group will not continue in existence at MIT, we hope that some of the unfinished work on 
MARCH may continue as part of the APRIL project [9]. 


Another architectural project (pursued by I. Vuong-Adlerberg) concerned the design of a 
coherent cache for MARCH. The cache is based on a shared bus, which is made a split- 
transaction bus for increased bandwidth. MARCH’s ability to interleave several independent 
instruction streams prevents the processor from systematically becoming blocked on every 
cache miss, but presents new challenges for cache design. 


Upon a cache miss, a conventional cache remains unavailable for further processor requests 
until the data has been fetched and the miss has been completely processed. A new cache 
design is needed for a multithreaded processor like MARCH, which will allow the proces- 
sor to continue to access the cache even while misses are being processed. The use of a 
split-transaction bus further complicates the design challenge by increasing the variety of 
asynchronous events to which the cache may need to attend. A cache design to meet this 
challenge was developed [305], the kind of coherence that it provides was formally defined, 
and the validity of the cache design was proven. The l'terature offered very little pre-existing 
support for such proofs, so a formalism was ¿lso developed for expressing statements about 
consistency and aiding in their proof [305]. 
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9.1 Introduction 


Research in the Programming Methodology Group has continued to focus on the area of 
distributed computing. In addition to our work on Argus and Mercury, we have also studied 
replication methods, implementation of distributed applications, and theory of distributed 
systems. Our research in these areas is described below. 


9.2 Argus and Mercury 


We have continued our study of how to extend Argus to provide access to Mercury mech- 
anisms. One issue is how to relate Argus types to Mercury types. The mechanism must 
support two activities: building relationships between Argus and Mercury types, and in- 
dicating what relationships to use in making a remote call. Ideally, the method chosen 
must make it easy to do things in a standard way, yet make it possible to relate types in 
nonstandard way. 


Building a relationship is done by defining an association, which contains two translation 
functions, one mapping from an Argus type to a Mercury type and the other mapping in 
the opposite direction. Typically, an association will be defined as part of implementing 
an abstract type, although it is possible to define one independently as well. Argus will 
provide a nuinber of builtin associations that relate builtin types to Mercury types. No 
restrictions are placed on the number of associations for a type; instead, an Argus type can 
be associated with many Mercury types and vice versa. For each Argus type, one association 
can be declared the default. 


Indicating what associations to use in making a call is done as part of the declaration of the 
type of the remote procedure being called. If a default association is desired for a particular 
parameter, only the Argus type need be given for that parameter. Otherwise, the association 
to be used is indicated explicitly. For example, 


h: handlertype (char) returns (int$toint16) signals (over(real)) 


indicates that in calls of h, the default association should be used for the character argument, 
and also for the real result in the case where the exception over is signaled. However, if the 
call returns normally, the association int$to_int16 should be used to map the 16 bit integer 
returned by the call into an Argus int. (The default association for ints maps them to 32 bit 
integers.) 


9.3 Formal Models for Nested Transactions 


William Weihl, working with Nancy Lynch, Michael Merritt, and Alan Fekete, has continued 
working on formal models for nested transactions. In the last year, we have completed a 
draft of a book, the goal of which is to unify and generalize the work done over the past 
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several years on modeling algorithms for transaction processing in distributed systems. Our 
attempts at unification have resulted in proofs of a number of interesting algorithms, all 
in the same general framework. The development of the general model has also resulted 
in the invention of new algorithms that generalize and extend existing algorithms, both to 
handle nested transactions and to permit more concurrency than is permitted by existing 
algorithms. We have also defined correctness conditions for implementations of atomic data 
types in languages such as Argus. These correctness conditions serve as useful guidelines 
during program design and implementation, in much the same manner as loop invariants 
can be used for sequential programs. 


9.4 Recovery Algorithms 


William Weihl has continued work on formal models and verification techniques for recov- 
ery algorithms for transaction systems. Most previous work treats concurrency control and 
recovery as independent problems. In practice, however, designing a concurrency control al- 
gorithm requires careful consideration of the details of recovery. We have developed a model 
for transaction processing systems that allows concurrency control and recovery algorithms 
to be described abstractly in simple mathematical terms. The interactions between concur- 
rency control and recovery can then be analyzed relatively simply. In a separate step, the 
implementations of the concurrency control and recovery algorithms can each be shown to 
implement the more abstract descriptions used to analyze their interactions. We have used 
the model to analyze the constraints placed by two separate recovery algorithms, update- 
in-place and deferred-update, on conflict-based concurrency control algorithms. We have 
proved necessary and sufficient conditions for a concurrency control algorithm to work with 
each of the recovery algorithms. These conditions are interesting for several reasons. First, 
they give precise bounds on the level of concurrency permitted by each recovery method. Sec- 
ond, they directly lead to new concurrency control algorithms that permit more concurrency 
than previously existing algorithms. Third, the two recovery algorithms are incomparable 
in terms of the constraints each places on concurrency control: each permits concurrency 
control algorithms that the other does not. These results are described in a paper in the 
Proceedings of the 1989 Symposium on Principles of Database Systems. 


9.5 Storage Management for Persistent Memory 


William Weihl and Elliot Kolodner have been working on efficient automatic storage man- 
agement for persistent memory. Crash recovery algorithms for databases require explicit 
interaction with the recovery system to allocate and free stable objects, and do not cope 
with objects that change locations. We have developed garbage collection algorithms for 
crash-tolerant systems. The algorithms are described in a paper in the Proceedings of the 
1989 SIGMOD conference. We are currently looking at incremental and generation-based 
methods. 
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9.6 The Meaning of Subtypes 


In a thesis finished in December 1988 [199], Gary Leavens describes a verification method 
for object-oriented programs that use subtypes. Object-oriented programs are polymorphic, 
in the sense that the type (or class) of the target of a message may not be known until run 
time; thus, the actual code that will be run as the result of an invocation is not statically 
determinable. Leavens presents the idea of a “simulation relation,” which explains how the 
objects of a subtype can be viewed as objects of a supertype, and shows how simulation 
relations can be used to verify object-oriented programs. The method is a natural extension 
of standard axiomatic techniques for specifying and verifying programs that use abstract 
data types. 


9.7 New Replication Method 


In a thesis completed in May 1989 [192], Rivka Ladin has defined a new replication method. 
This method is an extension of our earlier work [193] on replication techniques. Information 
is stored at a logically centralized service that is accessible to clients by making remote calls 
on its operations. To make the service highly available, so that it is likely to be accessible to 
clients when needed, the service is implemented by a number of replicas. Client operations 
take place at just one replica, and can usually be processed without delay, so the replication 
technique does not slow clients down. Update operations, which modify the state of the 
service, never cause a delay; the replica that performs the update communicates the new 
information to the other replicas in the background by using “gossip” messages. Query 
operations, which observe the state, will be delayed if an update whose effect needs to be 
observed is not yet known at the replica processing the operation; but this occurs rarely. 


Ladin’s method allows clients to indicate how operations are to be ordered. Both queries and 
updates can be required to occur after other updates. This is accomplished by associating 
each update with a unique identifier and having each query and update take a set of uids as 
an argument; the service will ensure that the query or update occurs after all updates whose 
uids are in the set. 


The client-specified ordering does not provide a way to order operations that occur in parallel. 
To handle this case, Ladin defines two extensions that allow stronger orderings to be defined. 
These extensions increase the applicability of the method so that it can be used in systems 
where most operations are ordered by clients, but occasionally a stronger order is required. 


9.8 Garbage Collection in a Distributed System 


Also as part of her thesis, Ladin has defined a new technique for doing garbage collection of a 
distributed heap. The method uses a centralized service to keep track of inter-node references; 
the service is made highly available by replication, e.g., using the technique described above. 
The method allows nodes to garbage collect independently, using different algorithms if 
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desired. After doing a garbage collection, a node informs the service about its references to 
other nodes’ objects and inquires about the accessibility of any of its objects that might be 
accessed by other nodes. The method is thus able to detect cycles of inaccessible objects. 
Using the central service has several advantages: it requires fewer messages than if nodes 
communicate with one another directly; it offloads work from the clients to the server, thus 
freeing up the clients to work on behalf of their users; and it scales to large systems. 


9.9 High Availability for Linda 


In a thesis completed in August 1988 [313], Andrew Xu defined a method for providing 
high availability for the Linda tuple space [72]. The Linda tuple space is a nonstandard 
memory model that requires less synchronization between reads and writes than a standard 
model. As such, it is of interest for parallel and distributed systems because the reduced 
synchronization can translate into better performance. Previous implementations proposed 
for Linda, however, do not support high availability: if any node containing part of the 
tuple space fails, the memory is lost. Xu’s thesis provides an efficient implementation that 
overcomes this problem. 


9.10 Optimistic Concurrency Control in Distributed Systems 


In a thesis completed in May 1989 [140], Bob Gruber extended optimistic concurrency control 
to work in a distributed system that supports nested atomic transactions. His thesis describes 
two methods. The first uses the fired action model in which a transaction runs entirely at 
a single site and all objects that it uses are copied to that site. In the second, the fized 
object model, objects never move; instead the transaction runs at the sites of the objects it 
uses. The performance of the two methods appears to be roughly the same, although the 
algorithms used in the fixed object model are more complicated. The fixed action model 
is probably the more interesting of the two because it matches recent work in distributed 
object-oriented data bases, in which copies of the objects used by a client reside in the client’s 
cache. 


9.11 Stable Storage Service 


In a thesis finished in May 1989 [83], Jeff Cohen designed and partially implemented a new 
stable storage systems for Argus. The current implementation of Argus uses a disk at each 
node to store the stable information of all guardians at that node. This means that our 
stable storage is not truly stable, since a failure of a single disk can cause the loss of all 
stable information for that node. True stable storage would require two disks per node, and 
the time needed to write to stable storage would be high, since each stable write would need 
to be done to both disks and the two disk writes must happen sequentially [195]. 


To reduce these costs, Cohen worked on the new system, which is based on [92]. In his 
system, stable storage is provided by a stable storage service that can be accessed across 
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the network. The service consists of three server nodes. To write or read information at the 
service, the Argus system at a guardian must communicate with any two of these servers. 
Each server has a large disk and an uninterruptible power supply. The power supply allows 
the server to handle a write request entirely in primary memory; the new information in 
the write request is written to disk later in background mode. This means that a write to 
an individual server takes a length of time roughly equal to a roundtrip message delay. We 
expect the new service to be faster than the current Argus system because the writes to the 
two servers can be done in parallel and the network roundtrip delay in a local area net is 
less than the delay for a disk write. 
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10.1 Introduction 


The Programming Systems Research Group has made progress in two areas during the 
1988-89 year. The group has worked on a new programming model for parallel computation 
involving the notion of an effect system. Furthermore, the group has enhanced ihe function 
of the distributed database system that we have designed and implemented. 


The prototyne language FX is a new programming model for parallel computation that 
combines the good features of both imperative and functional programming languages. Iı 
uses an effect system to investigate the use of effect specifications on controlling concurrency. 
An effect system is analogous to a type system (as found in many programming languages); 
but whereas types describe what results from a computation, effects classify how the com- 
putation proceeds. Our system deduces information that will enable the efficient parallel 
implementation of a broad class of polymorphic programming languages. 


During the past year, our investigations with the prototype implementation have focused on: 


e the design and implementation of FX subsets, one of which is used in the graduate 
programming language course; 


e the design and testing of optimistic inference algorithms for side effect estimation of 
FX expressions; 


e developing effect specifications for message passing concurrency and first class contin- 
uations. 


We also developed The Boston Community Information System (BCIS) in a contin- 
uation of our work from last year. BCIS is a large scale information system that is in use at 
over 150 sites in the Boston area. The system was improved during the last year with the 
addition of an electronic mail interface to the text-based article retrieval system. The goal 
of our research with BCIS is to explore how the broadcast system architecture can be used 
to implement ir‘ormation systems which can support very large user populations—perhaps 
up to one million users. 


10.2 Community Information System 


During the 1988-89 year, we continued to run the Boston Community Information System 
experiment. This experiment provides New York Times and Associated Press news wires 
to our users. [t consists of three main programs: a PC based version (BCIS), a TC-PIP 
program { Walter) and an electronic mail based system (The Clipping Service). Over the 
past year, over 200 Boston area homes, 40 Internet hosts and 50 electronic mail users have 
participated in our experiment. 


The experiment also yielded exploration into other platforms. An Apple Macintosi, ‘ersion 
of the service has been prototyped and an X Windowing System implementation of he 
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Walter database query application has been completed. XWalter provides a “point-and- 
click” interface to novice users. “urthermore, Clipsend, the electronic mail portion of the 
experiment, is continuing to be improved to run more reliably and support a large user 
community. We already have users in all areas of the world, such as Japan, Switzerland, and 
France. We expect to be able to support well over 100 users by the end of next year. 


Our experiment to charge our PC participants tive dollars per month for the broadcast service 
has been successful; over two-thirds of the user population continues to participate. During 
the past year, we have spent time exploring the benefits of the technology and improving the 
documentation for the existing services. The experimental data report analyzed the immense 
amount of feedback we received from our users, and foura that the Boston Community 
Information System provides a useful complement to existing media forms and has proved 
valuable to the many users in the test population. 


10.3 FX Effect Analysis 


The notion of effect system has been extended to deal with different aspect. of compile-time 
analysis of programs: 


e An extension to introduce the so-called “control effects” has been developed. This 
technique allows the introduction of first class continuations in the FX-87 programming 
language. 


e Explicit parallelism can also be incorporated in a language that uses an effect system. 
A message-based extension to FX-87 has been designed and implemented on top of the 
experimental FX-87 Interpreter. 


A major redesign of the FX programming language is under way in the PSR Group: 


e A complete draft. reference manual of this new version of FX has been written; Pierre 
Jouvelot is one of the co-editors of this specification. 


e This design introduces first class modules. Among them, a vector facility inspired by 
Fortran 8X and the Scan Model has been proposed. 


10.3.1 Reasoning about Continuations with Control Effects 


First. class continuations add a great deal of expressive power to a programming language as 
they permit the implementation of a wide variety of control structures, including Jumps, error 
handlers, and coroutines. With this power comes substantial semantic and implementational 
complexities. Thus it would be very useful to be able to precisely identify which expressions 
in a program use first class continuations and in what manner. 
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We have developed a new static analysis method for first class continuations that uses an 
effect system to classify the control domain behavior of expressions in a a typed polymorphic 
language. We introduce two new control effects, goto and comefrom, that describe the 
control flow properties of expressions. An expression that does not have a goto effect is said 
to be continuation-following because it will always call its return continuation. An expression 
that does not have comefrom effect is said to be continuation-discarding because it will never 
preserve its return continuation for later use. Unobservable control effects can be masked 
by the effect system. Control effect soundness theorems guarantee that the effects computed 
statically by the effect system are a conservative approximation of the dynamic behavior of 
an expression. 


The effect system that we describe performs certain kinds of control flow analysis that were 
not previously feasible. This analysis can enable a variety of compiler optimizations, includ- 
ing parallel expression scheduling in the presence of complex control structures. This control 
effect system has been implemented in the context of the FX-87 programming language. 


10.3.2 Communication Effects for Message-based Concurrency 


Although a fair amount of parallelism can be automatically extracted from sequential pro- 
grams by smart compilers, there are some problems for which an explicitly paraliel algorithm 
is more natural to express and easier to efficiently implement. There are numerous parallel 
paradigms that can be added to an otherwise sequential language to fulfill that goal, such as 
message passing, systolic programming, and fork/join models. We have developed a message- 
based communication framework based on communication effects. Communication effects 
are used to describe the communication behavior of expressions in a typed polymorphic pro- 
gramming language. Concurrency occurs between processes connected by channels on which 
messages are transmitted. Communication operations are characterized by two operators, 
out and in, depending on whether a message has been sent or received. Synchronization 
is only allowed by message passing along shared channels; communication via mutation of 
global variables is strictly prohibited by our communication effect system, thus restricting 
the amount of nondeterminacy in user programs. 


Communication effects permit a programmer to express concurrency in a rather flexible way 
while preserving the correctness of implicit detection of parallelism and optimization by the 
compiler. This system is powerful enough to express many other parallel paradigms, like 
systolic arrays or pipes. This new concurrency framework has been implemented in the 
FX-87 programming language. 


10.3.3 Polymorphism and Side Effects 
We have been engaged in a project to develop a programming model for parallel computation 
which combines the best features of imperative and functional programming languages. The 


effect system which we have developed is analogous to a type system, and provides an 
algebraic framework for describing the behaviour of computations. 
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As reported last year, a prototype implementation of the FX programming language has been 
built. Our investigations in the area of type reconstruction systems and our performance 
experiments have led us to investigate several new applications of effect systems: 


e astatic checking system for abstract type constructors and destructors which facilitates 
the convenient use of modular data implementations; and 


e a more flexible static type reconstruction system which may permit memory represen- 
tation optimizations. 


Type Reconstruction for Pattern Matching 


We have developed a typing system which permits data constructor and destructor proce- 
dures to be used as first class values; this typing system permits first class procedures to 
double as pattern-match operators and generalizes the notion of pattern-matching. 


Typechecking Polymorphic Expressions with Side Effects 


We have investigated a simple typing system which associates side effects with type variables 
in order to permit polymorphism in the types of some expressions which perform side effects. 


10.3.4 FX Large Project Programming 


We have continued work on the FX module system. FX modules allow abstract types, 
transparent types, and values to be packaged together into first class modules. A system of 
static dependent types guarantees type safety in the presence of first class modules. We use 
the FX effect system to guarantee type safety in the presence of side effects. Our system, 
combined with a simple facility for reading (module) values from files, obviates the need for 
a separate linking language (as in ML or Chu). 


This past year, we integrated this module system into our new prototype FX implementation. 
Our type reconstruction system allows many deciarations within modules to be omitted and 
permits clients of modules to benefit from implicit polymorphism of values exported by 
modules. 


Our implementation packages the standard types and operations into a large FX imodule 
that is available to the programmer. This allows our language design and Iinplementation to 
be more modular by separating the language kernel from these standard language features. 
Modules also provide a convenient way to experiment with new langnage Matures as well as 


new versions of older features. 


We are continuing to refine the module system design by investigating: 
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è more concise methods of combining modules to form larger ones, 
e support for common idioms (like ML’s datatype), and 


e support for persistent module storage. 
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11.1 Introduction 


Spoken language input to computers is a major goal in our research in developing a graceful 
human-machine interface. Despite some recent successful demonstrations of speech recogni- 
tion capabilities, current systems typically fall far short of human capabilities of continuous 
speech recognition with essentially unrestricted vocabulary and speakers, under difficult 
acoustic environments. Our approach to this problem is to seek a good understanding of 
human communication through spoken language, to capture the essential features of the 
process in appropriate models, and to develop the necessary computational framework to 
make use of these models for machine understanding. 


It is our belief that the development of advanced human/machine communication systems 
will require expertise in signal processing, system theory, pattern recognition, and computer 
science. built on a solid understanding of speech science and linguistics. We place heavy 
emphasis on designing systems that can make use of the knowledge gained over the past 
four decades in human communication, with hope that such systems will one day have a 
performance approaching that of humans. Specifically, our approach is based on the following 
premises: 


e The speech signal contains information regarding the intended linguistic message. It 
also contains information on the acoustic environment and the identity and physiolog- 
ical/ psychological states of the speaker. As far as speech recognition is concerned, the 
latter sources of information can be considered as undesirable noise. Robust speech 
recognition is critically tied to our ability to successfully extract the linguistic infor- 
mation and discard those aspects that are eztra-linguistic. 


e Past research in spoken language communication has established phonemes as psycho- 
logically real units for representing words in the lexicon. Therefore, phonemes and 
other equivalent descriptors, such as distinctive features and syllables, are the most 
appropriate units to relate words to the speech signal for machine recognition as well. 


e While pnonemes are discrete abstract linguistic entities, their acoustic realizations in 
speech are inherently continuous, reflecting the movement of the articulators from one 
position to the next. Many of the acoustic cues for phonetic contrasts are encoded at 
specific times in the speech signal. In order to fully utilize these acoustic attributes, 
we believe that one must explicitly establish acoustic landmarks in the signal. 


e Previous attempts at explicit utilization of speech knowledge have resulted in the 
development of systems that are based o1 heuristic rules. Such efforts typically require 
intense knowledge engineering, and as such are often hampered by the lack of a unified 
control strategy. As a result, system development is slow, and the performance fragile. 
In contrast, we seek to make use of the available speech knowledge by embedding such 
knowledge in a formal framework whereby powerful mathematical tools can be utilized 
to optimize its use. 
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e Despite significant advances made in phonetics, phonology, and other aspects of linguis- 
tics over the past decades, we still lack a complete understanding of the human speech 
communication process. To deal with our present state of ignorance and the inherent 
variability that exists throughout the process, the speech recognition system must have 
a stochastic component. However, it is our belief that speech-specific knowledge will 
enable us to build more sophisticated stochastic models than what is currently being 
attempted, and to reduce the amount of training data necessary for high performance. 


e The ultimate goal of our research is the understanding of the spoken message, and the 
subsequent accomplishment of a task based on this understanding. To achieve this 
goal, we must fully integrate the speech recognition part of the problem with natural 
language processing so that higher level linguistic and pragmatic constraints can be 
utilized. 


e The development of a spoken language understanding system will require interactions 
with several disciplines in computer science. Parallel computing will be necessary for 
real time processing. Efficient algorithms can greatly reduce the search space for the 
recognition process. Finally, theories of learning will help the system to adapt to new 
speakers, environments, and tasks. : 


The research projects in the Spoken Language Systems Group fall into several areas. First, 
a number of basic research topics are being explored. These include the formulation and 
testing of various computational models for human auditory processing, speech perception, 
and natural language processing, suitable for spoken language understanding. We are also 
attempting to quantify the acoustic cues for phonetic contrasts, and the effects of speaking 
cate and style on the acoustic properties of speech. Secondly, these research results are fun- 
neled into the development of an experimental spoken language system. Thirdly, alternative 
approaches to speech recognition, including the use of artificial neural nets and strategies 
derived from vision research, are being explored. Finally, part of our effort is devoted to the 
development of the necessary infrastructure, including the development of speech research 
tools and databases. 


The Spoken Language System- Group was formed in January 1989, with members drawn 
from the Speech Communic: — a Group at the Research Laboratory of Electronics. While 
the research described herewith was conducted primarily at LCS and is intended to cover 
only the six months period since January, some overlap with our earlier research activities 
at RLE is unavoidable. 


11.2 Research Reports 


11.2.1 Continuous Speech Recognition: The SUMMIT System 


Recently, we have put together a speech recognition system which embodies some of the 
research that we have been conducting in automatic speech recognition. The system, which 
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Figure 11.1: Intermediate representation leading to the recognition of the sentence, “Where 
is the nearest hospital?” The display contains: (a) synchrony spectrogram, (b) a dendrogram 
describing the multi-level acoustic segmentation, (c) a phonetic recognition network, (d) a 


word pronunciation network, and (e) the recognition result. 


we call SUMMIT, is intended to serve as a testbed for a segmental-based approach to speech 
recognition. In addition, it enables us to explore how speech recognition can be integrated 
with natural language processing in order to achieve speech understanding. 


The SUMMIT system starts the recognition process by first transforming ‘he speech signal 
into a representation that models the known properties of the human auditory system [283). 
The representation is illustrated in Figure 11.1(a), for the sentence “Where is the nearest 
hospital?” Using the output of the auditory model, acoustic landmarks of varying robustness 
are located and embedded in a hierarchical structure called a dendrogram [123], as shown 
in Figure 11.1{b). The acoustic segments in the dendrogram are then mapped to phoneme 
hypotheses, using a set of automatically determined acoustic parameters in conjunction with 
conventional pattern recognition algorithms [264]. The result is a phoneme network, in which 
each arc is characterized by a vector of probabilities for all the possible candidates, as shown 
in Figure 11.1(c). 


Words in the lexicon are represented as pronunciation networks, which are generated auto- 
matically by a set of phonological rules. This is illustrated in Figure 11.1(d) for the word 
“hospital.” Probabilities derived from training data are assigned to each arc to reflect the 
likelihood of a particular pronunciation. Presently, lexical decoding is accomplished by using 
the Viterbi algorithm to find the best path that matches the acoustic-phonetic net work with 
the lexical network. The recognized word string is shown in Figure 11.1(e). 
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Figure 11.2: Rank order statistics for the current phone classifier on a speaker-independent 
task. There are 38 context-independent phone labels: 14 vowels, 3 semivowels, 3 nasals, 8 


fricatives, 2 affricates, 6 stops, 1 flap, and one for silence. 


We recently evaluated SUMMIT’s performance in a number of ways. Phonetic classification 
performance was evaluated by comparing the labels provided by the classifier to those in a 
time-aligned transcription, using 38 context-independent phone labels [318]. This particular 
set was selected because it has been used in other recent evaluations within the DARPA 
community. For a single speaker, the top-choice classification accuracy was 77%. The correct 
label is within the top three nearly 95% of the time. For multiple and unknown speakers, 
the top-choice accuracy is about 70%, and the correct choice is within the top three over 
90% of the time. Figure 11.2 shows the rank order statistics for both the speaker-dependent 
and speaker-independent cases. 


Word accuracy for the SUMMIT system was evaluated during February on the DARPA 
1000-word Resource Management task [319]. Two different speaker-independent test sets 
provided by NIST, consisting of 150 and 300 sentences, respectively, were used [252]. The 
SUMMIT system achieved a word accuracy of 87% on both test sets, using the designated 
word-pair grammar with perplexity of 60, and approximately 70 context-independent phone 
models. SUMMIT’s performance compares favorably with systems that are based on hidden 
Markov modeling, when evaluated on the same data and using a similar number of phone 
models [200]. Since other researchers have been able to improve their system’s performance 
by increasing the number of models to accommodate context-dependency, we expect that 
we can similarly improve SUMMIT’s performance. 


Currently the SUMMIT system is implemented on a Symbolics Lisp Machine augmented 
with an FPS Array processor, and runs in several hundred times real time. Over the next 
year, we will begin to port the system to a faster platform in conjunction with developed 
dedicated hardware to achieve near real time performance. 
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11.2.2 Natural Language Processing: The TINA System 


A new natural language system, TINA, has been developed in our group [284] which inte- 
grates key ideas from context free grammars, Augmented Transition Networks (ATN’s) [311], 
and Lexical Functional Grammars (LFG’s) [66]. TINA is specifically designed to accommo- 
date full integration between speech recognition and natural language processing, and has a 
set of features reflecting this philosophy. 


The grammar begins with a set of context-free rewrite rules, which are augmented with 
parameters to enforce syntactic and semantic constraints. These rules are converted auto- 
matically to a network form, leading to extensive structure sharing. All arcs in the network 
have associated probabilities, which can be trained automatically from a set of parsed sen- 
tences. The parser uses a best-first search strategy. Control includes both top-down and 
bottom-up cycles, and key parameters are passed among nodes to deal with long-distance 
movement and agreement constraints. The probabilities provide a natural mechanism for 
exploring more common grammatical constructions first. TINA also includes a new strategy 
for dealing with movement, which can handle efficiently nested and chained gaps, and rejects 
crossed gaps. 


Over the past few months, TINA has been ported to the DARPA 1000-word Resource 
Management task. We used the 791 designated training sentences and 200 (unseen) test 
sentences to evaluate our parser for coverage and perplexity. The training was a two-step 
process. We first expanded the coverage of the grammar until it could handle all of the 791 
training sentences (100% coverage). We then built a new subgrammar from these sentences, 
with probabilities on arcs updated according to their usage within the training set (any rules 
that only appeared in the TIMIT domain were automatically discarded). This resulted in 
a grammar that was tightly defined for the RM task. We then tested this grammar for 
coverage and perplexity on the 200 test sentences. The results were that 84% of the test 
sentences were parsable, and the perplexity was 368 if all words that could follow each word 
were considered to be equally likely. The surprising result was that the perplexity dropped 
9-fold when arc probabilities were incorporated into the measurement, down to 41.5. We also 
looked at the parses to establish the depth from the top of the correct parse. We found that 
88% of the training sentences gave a correct parse as the first choice; this number increased 
to 90% for the test sentences. Both sets gave the correct parse within the top three 98% of 
the time. 


11.2.3 Spoken Language Understanding: The VOYAGER System 


Over the past three months, we initiated an effort in spoken language understanding. The 
project is motivated by our belief that many of the applications suitable for human/machine 
interaction using speech typically involve interactive problem solving. That is, in addition 
to converting the speech signal to text, the computer must also understand the linguistic 
structure of a sentence in order to generate the correct response. 


In order to explore issues related to a fully-interactive spoken language system, we selected 
a task in which the system knows about the physical environment of a specific geographical 
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area, and can provide assistance on how to get from one location to another within this area. 
The system, which we call Voyager, can also provide information concerning certain objects 
located inside this area. The current version of Voyager focuses on the geographic area of 
the city of Cambridge between MIT and Harvard University, and can answer a number of 
different types of questions about certain hotels, restaurants, hospitals, and other objects 
within this region. 


Voyager is made up of three components. The first component, SUMMIT, converts the 
speech signal into a set of word hypotheses. The natural language component, TINA, then 
provides a linguistic interpretation of the set of words. The parse generated by the natural 
language component is then transformed into a set of query functions, which are passed 
to the backend for response generation. The backend is an enhanced version of the di- 
rection assistance program developed by Jim Davis of the Media Laboratory at MIT. The 
response generator maintains some knowledge about recent discourse history, which allows 
it to respond appropriately to queries such as “How do I get there?” Currently, Voyager can 
generate responses in the form of text, graphics, and synthetic speech. 


As of now, Voyager has a vocabulary of approximately 400 words, and it can deal with about 
ha!f a dozen types of queries, such as the location of objects, simple properties of objects, 
how to yet from one place to another, and the distance and time for travel between objects. 
Within this limited domain of knowledge, it is our hope that Voyager will be able to handle 
any reasonable query that a native speaker is likely to initiate. As time progresses, Voyager’s 
knowledge base will undoubtedly grow. 


11.2.4 Isolated Word Recognition over Telephone Networks 


Over the past few months, we initiated an effort to develop a small-vocabulary, isolated- 
word recognition system. The focus of this research is to explore how our phonetically- and 
segmentally-based approach will fare with the bandlimited and distorted speech transmitted 
through local and long distance telephone networks, spoken by real users. 


As a first step, we selected the task of recognizing a small set of city names. We have imple- 
mented such a system, and have begun some preliminary evaluations, using data collected 
by NYNEX Corporation. We are also using this simple task as a framework in which to 


explore the use of unsupervised learning techniques to enable the automatic expansion of 
the vocabulary. 


11.3 Student Reports 


Nancy Daly 


During the spring semester, Daly spent most of her time working as a teaching assistant for 
a new course on automatic speech recognition introduced by Victor Zue. Over the next few 
months, she plans to take her area exam and work on her doctoral thesis, which is in the 
area of prosodic aids for speech recognition. 
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Prosody is the stress, rhythm, and intonation of speech. While the importance of prosodic 
information has long been documented for human speech communication, automatic speech 
recognition systems developed as of now have all but ignored this source of information. The 
purpose of her thesis research is to see how prosodic information could be incorporated into 
speech recognition systems to improve their performance. 


One of the first projects is to determine what form of prosodic information can be reliably 
extracted from the speech signal in the absence of any segmental information. Specifically, 
she is investigating whether stressed syllables can be identified reliably by native listeners 
when the phoneme identity has been removed from the speech signal through inverse filtering. 
This line of investigation can lead to the determination of the stress pattern of words, and 
the use of this knowledge to aid phonetic recognition and lexical access. 


Rob Kassel 


Kassel is pursuing a Master’s thesis on the use of distinctive features for lexical access. 
Distinctive features have been proposed by many as a sub-phoneme linguistic unit. Feature 
spreading can concisely represent the allophonic variation found in spoken language, an 
attractive property for speech recognition systems. He began a study to determine the 
expressive power of distinctive features in terms of information theoretical measures. 


Hong Leung 


Leung just completed his Ph.D. thesis entitled “The Use of Artificial Neural Networks for 
Phonetic Recognition.” One of the major problems with current speech recognition systems 
is that the system’s self-organizing framework is very powerful but too rigid for incorporating 
more human knowledge about speech, or that there is a significant amount of human knowl- 
edge in the system but the control strategy is not powerful enough. Due to their flexible 
self-organizing framework, artificial neural networks (ANN’s) can potentially bridge the gap 
between our knowledge in speech and powerful self-organizing mechanisms. Leung’s thesis 
is concerned with the use of ANN’s for phonetic recognition. There are three major objec- 
tives. First, by investigating ANN’s in order to gain a better understanding of their basic 
characteristics and capabilities, we may be able to exploit them more fully as pattern clas- 
sifiers. Secondly, by properly applying our acoustic-phonetic knowledge, we can potentially 
enhance the flexible framework of ANN’s for phonetic recognition. Thirdly, by comparing 
them with traditional pattern classification techniques, we can better understand the merits 
and shortcomings of the different approaches. 


The multi-layer perceptron (MLP) was selected for his investigation, which centered around 
a set of vowel recognition experiments. In order to isolate different sources of variability in 
the speech signal, four different databases were used for oui study. The largest database 
consists of 22,000 vowel tokens extracted from continuous sentences in the TIMIT database, 
spoken by 550 male and female speakers. The performance of the network was evaluated 
in several ways. Evaluation in terms of average agreement with the phonetic transcription 
suggests that the performance of the network compares favorably to human performance in 
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perceptual experiments. Evaluation along the phonological dimension suggests that most of 
the confusions between the network and transcription labels are quite reasonable. 


Next, the characteristics and representations of the MLP were explored. Specifically, he 
examined the performance of the network as a function of the number of training iterations, 
amount of training data, number of hidden units, number of hidden layers, and use of the 
nonlinear sigmoid function. He also discussed the structure and self-organization of the 
internal representations, choices for output representations, and the use of heterogeneous 
input representations. Other issues discussed include error metrics for training the network, 
initializations of the network, and rapid adaptation of the network to a new speaker. 


Finally, the performance of the network was compared with that of two traditional clas- 
sification techniques. For the vowel classification task, experiments demonstrate that the 
MLP can yield higher performance than k-nearest neighbor and Gaussian classifiers. The 
results suggest that the MLP can provide an effective alternative for pattern classification, 
especially if the classification probiem is not well understood. 


Jeffrey Marcus 


Marcus has been working on incorporating speech units of different sizes (e.g., phoneme, 
diphone, word) in a speech recognition system. In addition, he is considering schemes for 
sharing information among speech units which have certain phonetic similarities so that 
parameter estimates for these models are improved. Another major goal of his work is to 
advance recognizer design methodology by demonstrating the utility of statistical and data 
analytic techniques which have not been applied previously. 


The work is currently focused on modeling function words such as “the” and “and,” since 
they vary greatly acoustically and cause a disproportionate number of recognizer errors. In 
the future, these techniques will be extended to other lexical and phonetic units. 


Helen Meng 


Meng joined the group in January, and spent the past semester finishing her Bachelor’s thesis, 
and building up background in speech through taking the courses 6.979, Automatic Speech 
Recognition and 6.541J, Speech Communication. In addition, she attended spectrogram 
reading sessions run in the Spoken Language Systems Group. A term paper was also written 
under the topic of “An Acoustic Study of the Semi-Vowel /l/.” The paper reports a study of 
prevocalic, intervocalic and postvocalic /l/’s in some data collected by Dennis Klatt. Over 
the next few months, she will be familiarizing herself with the computational facilities in the 
group, as well as searching for a topic for her Master’s thesis. 


John F. Pitrelli 


Pitrelli has been studying phoneme durations in order to develop a duration model to aid 
speech recognition. Duration is potentially a strong cue for certain phonemic distinctions, 
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including inherently long vs. short vowels, and voiced vs. unvoiced obstruent consonants. 
Phoneme durations are affected, though, by an abundance of factors ranging from detailed 
phonetic context effects to syntax and semantics. Our lack of understanding of these effects 
and their interactions hinders our use of potentially useful duration information to the extent 
that most speech recognition systems currently use only rudimentary duration models or use 
time-warping procedures, which distort duration information. 


Recent research has focused on two facets of the duration modeling problem. One is the 
completion and evaluation of a hierarchical model accounting for discrete-valued factor vari- 
ables, such as phonetic context and syntactic-unit-final lengthening. The other task has 
been a preliminary exploration of the effects of speaking-rate variations on phoneme du- 
ration. Future work includes the continuation of the speaking-rate experiments, with the 
goal of improving understanding of rate effects on duration, both gradual, such as vowel 
compression, and abrupt, such as flapping of alveolar stops. Following these periments, 
the hierarchical duration model will be augmented by the incorporation of an ap, - priate 
function of speaking rate. 


Dimitry Rtischev 


Rtischev joined the group in January. Over the past five months, he worked on an inter- 
active software facility for simulation of hidden Markov models. The completed program, 
named HIMARK, provides a flexible experimental environment for constructing, training, 
and observing hidden Markov models and using them for various speech recognition tasks. 
HIMARK formed the basis for two lab assignments which he prepared for 6.979, Automatic 
Speech Recognition. Dimitry’s plans for the next year include research in applying statisti- 
cal methods such as HMM for speech synthesis and preparing for the Preliminary Written 
Examination and Oral Exam. 


Michal Soclof 


Soclof joined the group in January, and spent the spring semester learning about automatic 
speech recognition by taking the speech recognition and spectrogram reading courses, and by 
reading relevant material. In addition, she learned about the SUMMIT system and became 
familiar with the computational facilities in the group. During the upcoming months, she 
will be working on her Master’s thesis research. A potential topic which she is investigating 
is the problem of detecting speech in the presence of other vocalizations. This entails being 
able to distinguish between a speech event and a non-speech event such as throat clearing or 
coughing. She will be studying what makes the two events different and possible methods 
for distinguishing them. 


Sean Trowbridge 
Trowbridge joined the group in January, and spent the spring term mainly getting oriented, 
learning about speech recognition in general, and spectrogram reading in particular. He also 


helped out with some grading for 6.979, and started preliminary research for his Master’s 
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thesis. In the coming year, he will be working with Steve Ward on a thesis that involves the 
NuMesh computer and its application to speech recognition. He will be designing something 
resembling a compiler for the machine, which will take a computation specification, along 
with some resource constraints, and produce a topology and timing for each processor that 
will perform the computation specified. 
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12.1 Introduction 


The Systematic Program Development, though small, has a diverse set of interests. These 
include programming methodology, programming multiprocessors, specification languages 
(in conjunction with researchers at the Digital Equipment Corporation), circuit verification 
(in conjunction with researchers at the Technical University of Denmark), automatic theorem 
proving, and high performance garbage collection. 


12.2 The Larch Family of Specification Languages 


The Larch family of specification languages supports a two-tiered definitional approach to 
specification. Each specification has components written in two languages: one designed for 
a specific programming language and another independent of any programming language. 
The former are called Larch interface languages, and the latter the Larch Shared Language 
(LSL). 


Larch interface languages are used to specify the interfaces between program components. 
Each specification provides the information needed to use the interface and to write programs 
that implement it. A critical part of each interface is how the component communicates 
with its environment. Communication mechanisms differ from programming language to 
programming language, sometimes in subtle ways. We have found it easier to be precise 
about communication when the interface specification language reflects the programming 
language. Specifications written in such interface languages are generally shorter than those 
written in a “universal” interface language. They are also clearer to programmers who 
implement components and to programmers who use them. 


Each Larch interface language deals with what can be observed about the behavior of compo- 
nents written in a particular programming language. It inco: „urates programming-language- 
specific notations for features such as side effects, exception handling, iterators, and concur- 
rency. Its simplicity or complexity depends largely upon the simplicity or complexity of 
the observable state and state transformations of its programming language. Figure 12.1 
contains a sample interface specification for a CLU procedure in a window system, 


Larch Shared Language specifications are used to provide a semantics for the primitive terms 
used in interface specifications. Specifiers are not limited to a fixed set of primitive terms, but 


addWindow = proc (v : View,w: Window.c: Coord) signals (duplicate) 
modifies v 
ensures v’ = addW(v,w,c) 
except when w € v signals duplicate ensures v' = v 


Figure 12.1: Sample Larch/CLU Interface Specification 
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can use LSL to define specialized vocabularies suitable for particular interface specifications. 
For example, an LSL specification would be used to define the meaning of the symbols € 
and addW in Figure 12.1, thereby precisely answering questions such as what it means for 
a window to be in a view (visible or possibly obscured?), or what it means to add a window 
to a view that may contain other windows at the same location. 


The Larch approach encourages specifiers to keep most of the complexity of specifications in 
the LSL tier for several reasons: 


e LSL abstractions are more likely to be re-usable than interface specifications. 


e LSL has a simpler underlying semantics than most programming languages (and hence 
than most interface languages), so that specifiers are less likely to make mistakes. 


e It is easier to make and check claims about semantic properties of LSL specifications 
than about semantic properties of interface specifications. 


12.3 The LP Theorem Proving Syster 


LP has changed dramatically in the last year. We take this opportunity to present a fairly 
detailed overview of its current capabilities. 


The basis for proofs in LP is a logical system consisting of equations, rewrite rules, operator 
theories, induction rules, and deduction rules, all expressed in a multisorted fragment of 
first-order logic. A logical system in LP is closely related to an LSL theory, but is handled 
in somewhat different ways, both because axioms in LP have operational content as well as 
semantic content and because they can be presented to LP incrementally, rather than all at 
once. 


12.3.1 Declarations 


Sorts, operators, and variables play exactly the same roles in LP as they do in LSL. They 
must be declared, and operators can be overloaded. The syntax for operators at the moment 
is not as rich as in LSL, but we plan to rectify that. Unlike LSL, LP at present provides no 
scoping for variables. 


12.3.2 Equations and Rewrite Rules 


LP is based on a fragment of first-order logic in which equations play a prominent role. 
Some of LP’s inference mechanisms work directly with equations. Most, however, require 
that equations be oriented into rewrite rules, which LP uses to reduce terms to normal forms. 
It is usually essential that the rewriting relation be terminating, i.e., that no term can be 
rewritten infinitely many times. LP provides several mechanisms that automatically orient 
many sets of equations into terminating rewriting systems. For example, in response to the 
commands 
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declare sort G 
declare variables z,y,z: G 
declare operators e: + G, i: G > G, __*_: G,G >G 


assert 
(z *y)*z== z» Íy * z) 
e==2*i(z) 
exe 2==2 


that enter the usual first-order axioms for groups, LP produces the rewrite rules 


(r*xy)*z—4 xox (y *z) 
z*i(z) +e 
exr— Zz. 


It automatically reverses the second equation to prevent nonterminating rewriting sequences 
such as e > exi(e) > i(e) — i(e »i(e)) — i(ife)) — ... The discussion of operator theories, 
below, treats the issue of termination further. 


A system’s rewriting theory (i.e., the propositions that can be proved by reduction to normal 
form) is always a subset of its equational theory (i.e., the propositions that follow logically 
from its equations and from its rewrite rules considered as equations). The proof mechanisms 
discussed below compensate for the incompleteness that results when, as is usually the case, 
a system’s rewriting theory does not include all of its equational theory. In the case of group 
theory, for example, the equation e == i(e) follows logically from the second and third 
axioms, but is not in the rewriting theory of the three rewrite rules (because it is irreducible 
and yet is not an identity). 


LP provides builtin rewrite rules to simplify terms involving the Boolean operators =, & , 
i, =, and ©, the equality operator =, and the conditional operator if. These rewrite rules 
are sufficient to prove many, but not all, identities involving these operators. Unfortunately, 
the sets of rewrite rules that are known to be complete for propositional calculus require 
exponential time and space. Furthermore, they can expand, rather than simplify, proposi- 
tions that do not reduce to identities. These are serious drawbacks, because when we are 
debugging specifications we often attempt to prove conjectures that are not true. So none 
of the complete sets of rewrite rules is built into LP. Instead, LP provides proof mechanisms 
that can be used to overcome incompleteness in a rewriting system, and it allows users to 
add any of the complete sets they choose to use. 


LP treats the equations true == false and z == t, where t is a term not containing the 
variable z, as inconsistent. Inconsistencies can be used to establish subgoals in proofs by 
cases and contradiction. If they arise in other situations, they indicate that the axioms in 
the logical system are inconsistent. 
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12.3.3 Operator Theories 


LP provides special mechanisms for handling equations such as z +y == y + that cannot 
be oriented into terminating rewrite rules. The LP command assert ac + says that + is 
associative and commutative. Logically, this assertion is merely an abbreviation for two 
equations. Operationally, LP uses it to match and unify terms modulo associativity and 
commutativity. This not only increases the number of theories that LP can reason about, 
but also reduces the number of axioms required to describe various theories, the number of 
reductions necessary to derive identities, and the need for certain kinds of user interaction, 
e.g., case analysis. The main drawback of term rewriting modulo operator theories is that 
it can be much slower than conventional term rewriting. 


LP recognizes two nonempty operator theories: the associative-commutative theory and the 
commutative theory. It contains a mechanism (based on user-supplied polynomial inter- 
pretations of operators) for ordering equations that contain commutative and associative- 
commutative operators into terminating systems of rewrite rules. But this mechanism is 
difficult to use, and most users rely on simpler ordering methods based on LP-suggested 
partial orderings of operators. These simpler ordering methods do not guarantee termina- 
tion when equations contain commutative or associative-commutative operators, but they 
work well in practice. Like manual ordering methods, which give users complete control over 
whether equations are ordered from left to right or from right to left, they are easy to use. 
In striking contrast to manual ordering methods, they have not yet caused difficulties by 
producing a nonterminating set of rewrite rules. 


12.3.4 Induction Rules 


LP uses induction rules to generate subgoals to be proved for the basis and induction steps 
in proofs by induction. The syutax for induction rules is the same in LP as in LSL.' Users 
can specify multiple induction rules for a single sort, e.g., by the LP commands 


declare sorts E,S 
declare operators 


{}: 4S 
{- E >S 
U SS $8 


insert: S, E — S 


set name set/nduction1 
assert S generated by {}, insert 
set name setlnduction2 
assert S generated by {}, {--}, U 


and can use the appropriate rule when attempting to prove an equation by induction; e.g., 


'The semantics of induction is stronger in LSL than in LP, where arbitrary first-order formulas cannot 
be written. 
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prove z C (x Uy) by induction on z using setInduction2 


In LSL, the axioms of a trait typically have only one generated by for a sort. It is often 
useful, however, to put others in the trait’s implications. 


12.3.5 Deduction Rules 


LP subsumes the logical power of the partitioned by construct of LSL by allowing users to 
assert deduction rules, which LP uses to deduce equations from other equations and rewrite 
rules. In general, a partitioned by is equivalent to a universal existential axiom, which can 
be expressed as a deduction rule in LP. For example, the LP commands 


declare sorts E,S 

declare operator €: E,S — Bool 

declare variables e: E, z,y: S 

assert when (forall e) e € z == e € y yield z == y 
define a deduction rule equivalent to the axiom 


(Yz, y : S)[(Ve: E){eEz&eEy)=>zr=y] 


of set extensionality, which can also be expressed by assert S partitioned by € in LP, as in 
LSL. This deduction rule enaL:.s LP to deduce equations such as z == gz Uz automatically 
from equations such as e € z == e € (zU z). 


Deduction rules also serve to improve the performance of LP and to reduce the need for user 
interaction. Examples of such deduction rules are the builtin &-splitting law 


declare variables p, q: Bool 
when p & q == true yield p == true, q == true 


and the cancellation law for addition 


declare variables z,y,z: Nat 
when z + y == 7z + z yield y == z 


LP automatically applies deduction rules to equations and rewrite rules whenever they are 
normalized. 
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12.3.6 Proof Mechanisms in LP 


This section provides a brief overview of the proof mechanisms in LP. 


LP provides mechanisms for proving theorems using both forward and backward inference. 
Forward inferences produce consequences from a logical system; backward inferences produce 
lemmas whose proof will suffice to establish a conjecture. There are four methods of forward 
inference in LP. 


è Automatic normalization produces new consequences when a rewrite rule is added to 
a system. LP keeps rewrite rules, equations, and deduction rules in normal form. If an 
equation or rewrite rule normalizes to an identity, it is discarded. If the hypothesis of a 
deduction rule normalizes to an identity, the deduction rule is replaced by the equations 
in its conclusions. Users can “immunize” equations, rewrite rules, and deduction rules 
to protect them from automatic normalization, both to enhance the performance of 
LP and to preserve a particular form for use in a proof. Users can also “deactivate” 
rewrite rules and deduction rules to prevent them from being automatically applied. 


Automatic application of deduction rules produces new consequences after equations 
and rewrite rules in a system are normalized. Deduction rules can also be applied 
explicitly, e.g., to immune equations. 


The computation of critical pairs and the Knuth-Bendix completion procedure produce 
consequences (such as i(e) == e) from incomplete rewriting systems (such as the three 
rewrite rules for groups). We rarely complete our rewriting systems. However, we often 
make selective use of critical pairs. We also use the completion procedure to look for 
inconsistencies. 


Explicit instantiation of variables in equations, rewrite rules, and deduction rules also 
produces consequences. For example, in a system that contains the rewrite rules 
a < (b+ c) — true and (b + c) < d — true, instantiating the deduction rule 


when z < y == true, y < z == true yield z < z == true 


with a for z, b + c for y, and d for z produces a deduction rule whose hypotheses 
normalize to identities, thereby yielding the conclusion a < d — true. 


There are also six methods of backward inference for proving equations in LP. These methods 
are invoked by the prove command. In each method, LP generates a set of subgoals, i.e., 
lemmas to be proved that together are sufficient to imply the conjecture. For some methods, 
it also generates additional axioms that may be used to prove particular subgoals. 


e Normalization rewrites conjectures. If a conjecture normalizes to an identity, it is a 


theorem. Otherwise the normalized conjecture becomes the subgoal to be proved. 


149 


Systematic Program Development 


e Proofs by cases can further rewrite a conjecture. The command prove e by cases 
t,,...,¢, directs LP to prove an equation e by division into cases ¢;,...,tn (or into two 
cases, ti and 7(¢,), ifn = 1). One subgoal is to prove t; |... | tn- In addition, for each 
i from 1 to n, LP substitutes new constants for the variables of t; in both t; and e to 
form t} and e!, and it creates a subgoal e! with the additional hypothesis t; — true. If 
an inconsistency results from adding the case hypothesis t!, that case is impossible, so 
e; is vacuously true. 


Case analysis has two primary uses. If the conjecture is a theorem, a proof by cases 
may circumvent a lack of completeness in the rewrite rules. If the conjecture is not a 
theorem, an attempted proof by cases may simplify the conjecture and make it easier 
to understand why the proof is not succeeding. 


e Proofs by induction are based on the induction rules described above. 


e Proofs by contradiction provide an indirect method of proof. If an inconsistency follows 
from adding the negation of the conjecture to LP’s logical system, then the conjecture 
is a theorem. 


e Proofs of implications can be carried out using a simplified proof by cases. The com- 
mand prove tı => tz by = directs LP to prove the subgoal t, using the hypothesis 
ti — true, where t} and t, are obtained as in a proof by cases. (This suffices because 
the implication is vacuously true when t} is false.) 


e Proofs of conjunctions provide a way to reduce the expense of equational term rewrit- 
ing. The command prove t & ... &t„ by & directs LP to prove t;,...,t, as subgoals. 


LP allows users to determine which of these methods of backward inference are applied 
automatically and in what order. The LP command 


set proof-method & , >, normalization 


directs LP to use the first of the three named methods that applies to a given conjecture. 


Proofs of interesting conjectures hardly ever succeed on the first try. Sometimes the conjec- 
ture is wrong. Sometimes the formalization is incorrect or incomplete. Sometimes the proof 
strategy is flawed or not detailed enough. When an attempted proof fails, we use a variety 
of LP facilities (e.g., case analysis) to try to understand the problem. Because many proof 
attempts fail, LP is designed to fail relatively quickly and to provide useful information when 
it does. It is not designed to find difficult proofs automatically. Unlike the Boyer-Moore 
prover, it does not perform heuristic searches for a proof. Unlike LCF, it does not allow users 
to define complicated search tactics. Strategic decisions, such as when to try induction, must 
be supplied as explicit LP commands (either by the user or by a front-end such as LSLC). 
On the other hand, LP is more than a “proof checker,” since it does not require proofs to be 
described in minute detail. In many respects, LP is best described as a “proof debugger.” 
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12.4 Hardware Verification 


For many years, engineers have used simulation to convince themselves that the circuits they 
design behave as intended. As circuits get more complex, it becomes tempting to augment 
simulation with formal proofs. Typically, these proofs involve a large number of simple steps. 
Doing them by hand is cumbersome, boring, and prone to mistakes. Unless these proofs are 
machine generated or machine checked, there is very little reason to believe them. 


In the past year, we have conducted several successful experiments using a theorem prover, 
LP, to verify properties of VLSI circuits. We started with several circuits that had previously 
been verified by hand. We then tried to construct machine checked proofs with the same 
structure as the original proofs. In the process of using LP to verify the circuits, we uncovered 
several minor errors in, and simplifications to, the original circuits and manual proofs. 


Any formalized verification of a circuit must be based on an abstract description of the circuit. 
The choice of descriptive mechanism depends upon the intended use. Differential equations, 
for example, are useful in verifying physical properties such as power consumption, timing, 
or heat dissipation. Our approach is aimed at verifying functional properties of a design, 
and is based on describing the circuit as a parallel program, using a language, Synchronized 
Transitions, developed by Jgrgen Staunstrup of the Technical University of Denmark. 


While the circuits we have verified are not particularly complex, our experiments yielded 
several interesting insights. These include: 


e Even for simple circuits, one cannot rely on proofs that have not been machine checked. 


e Combined with Synchronized Transitions, the technique of invariant assertions used to 
verify safety properties of concurrent programs is useful for machine-aided reasoning 
about circuits. 


e The verification process is quite sensitive to the exact way in which the problem is 
formulated. For example, proofs seem to work better when induction can be done over 
the structure of the circuit rather than over time. 


e Circuit verification seems more amenable to machine checking than traditional program 
verification. 


e The style of mechanical theorem proving supported by LP seems well suited to reason- 
ing about circuits. 


12.5 Programming Multiprocessors 


In this research, we consider the problem of writing explicitly parallel programs for small 
to medium sized multiprocessors. To limit the scope of the work, the target applications 
are assumed to be symbolic problems, which are characterized by data structures that are 
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irregular and dynamic and by a low percentage of numerical operations. In addition, we 
consider only applications for which a sequential solution is possible, i.e., concurrency is 
used to improve performance but is not inherent in the problem definition. 


Various methods have been proposed to address the problem of writing and reasoning about 
parallel programs, but these tend to address only single module programs. Notions of module 
composition are missing and as a result, a style of program development is encouraged in 
which the entire program is designed and implemented as a single unit. Conversely. methods 
that have been developed for “programming in the large” of sequential programs are not 
applicable, and attempts to extend them to parallel programs often result in programs that 
exhibit very little real concurrency. Our goal is to be able to decompose parallel programs 
into independently specifiable units without prohibiting efficient implementations. 


The research is organized around an extended example application, which involves parailel 
algorithm synthesis, correctness arguments, program module specifications, an implementa- 
tion, and performance measurements. The application is to solve the completion problem 
for term rewriting systems, for which the well known Knuth and Bendix procedure is a se- 
quential solution. Although the completion problem has been studied extensively, there are 
currently no parallel solutions. 


Our parallel solution can be abstractly described by a set of inference rules that are non- 
deternunistically applied to a system of rewrite rules and equations. In recent years, there 
has been a trend toward describing sequential completion procedures in this manner, often 
leading to better algorithms and simpler correctness arguments. The framework is especially 
useful in the context of concurrency, since there are known techniques for reasoning about 
concurrent systems using the possible sequences of state transitions. In this respect, our 
set of inference rules defines the set of possible state transitions, but there is an important 
difference in how we intend to reason about these systems. Rather than reasoning about 
the behavior of each high level transition by considering directly the sequence of low level 
operations that implemert it, we intend to use the specifications of the underlying objects 
and their operations; each object is thus an independent unit which can be reused in any 
other context for which the same specification is required. 


The implementation effort is stillin progress but has already uncovered a number of interest- 
ing tradeoffs in the general programming problem. For example, concurrent data types that 
present the illusion of sequential access often lead to poor performance. The performance 
may be characterized by either long latency of operations, or little actual concurrency of 
multiple operations. In addition, examples of highly concurrent data types tend to have 
complex, almost mysterious implementations. In some cases, specification uetail can be 
added to allow implementations that are more efficient or less complex, but for this we pay 
a price in terıns of generality of the abstraction. We currently have an implementation of 
acompletion procedure that exploits a small amount of parallelism, and closely mimics the 
behavior of a sequential implementation. As we add more parallelism to the procedure, the 
mapping between the parallel and sequential solutions will become less obvious, and the 
correctness argument more difficult. 
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12.6 Logic Programming 


Most of the theoretical work on the semantics of logic programs assumes an interpreter that 
provides a complete resolution procedure. In contrast, for reasons of efficiency, most logic 
programming languages are built around incomplete procedures. This difference is rooted 
in Prolog, which evaluates resolvent trees in a depth-first rather than a breadth-first order. 
The gap is widened by some equational logic languages, which combine the incompleteness of 
depth-first evaluation with incomplete approximations to equational unification. Because of 
this gap, it is unsound to reason about logic programs using their declarative semantics. This 
in turn makes it difficult to develop abstraction mechanisms that can be used to partition a 
logic program into independently specifiable modules. 


In this work, we considered the role type systems can play in closing the gap between 
the operational and declarative semantics of logic programs. We develop the notion of an 
equational mode system for use in constraining the domains of both predicates and unification 
procedures. The mode system is used to guide the resolution-based interpreter, and as a 
result, we can show that two predicate implementations with the same declarative meaning 
will be operationally equivalent. 


This work was done in conjunction with Joseph Zachary of the University of Utah. 
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13.1 


Introduction 


The MIT/LCS Theory of Computation (TOC) Group is one of the largest theoretical com- 
puter science research groups in the world. It includes faculty, students, and visitors from 
both the Departments of Electrical Engineering and Computer Science, and Applied Math- 


ematics. 


The principal research areas investigated by members of the TOC Group are: 


algorithms: combinatorial, geometric, graph-theoretic, number theoretic; 
cryptology; 

computational complexity; 

parallel computation; 

distributed computation: algorithms and semantics; 

machine learning; 

semantics and logic of programs; and 


VLSI design theory. 


Group members were responsible for over 150 publications and several dozen public lectures 
around the world during the past year. The individual reports by faculty and students in the 
next sections, and the annotated reference and lecture lists offer further descriptions of the 
year’s activities. 


The following major research contributions merit highlighting: 


Awerbuch, Mansour, and Shavit’s polynomial solution to the basic network problem 
of “end to end communication”. 


Awerbuch and Sipser’s efficient implementation (constant time overhead) of the new 
notion of a “synchronizer for dynamic networks” implying that dynamic networks are 
as fast as static networks. 


Elias’ geometric demonstration that reliable communication at a positive rate is possi- 
bie over a channel which introduces a fraction 1/2 - € of errors, so long as the receiver 
is allowed to list O(1/e?) possible transmitted codewords rather than just one. 


Fortnow and Sipser’s oracle collapsing the probabilistic polynomial time hierarchy. 
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e Koch proved a decade old conjecture about the expected throughput of the dilated 
butterfly switching network that has application to the optimal design of networks like 
that used in the BBN Butterfly Machine (Ph.D. thesis). 


e Leighton and Maggs developed highly efficient packet routing algorithms for a twin 
butterfly. The algorithms are the first fault-tolerant routing algorithms for bounded 
degree switching networks, and appear to be superior to currently used algorithms even 
if there are no faults. 


The following are special awards received during the period: 


e Lynch was chosen to deliver the keynote address at last summer’s symposium on Prin- 
ciples of Distributed Computing. 


e Meyer was chosen to present an invited lecture at the Third IEEE Symposium on Logic 
in Computer Science, July 1988. 


e Sipser was chosen as the principal lecturer in the American Mathematical Society 
Conference on Circuit Complexity to be held this August in Chicago. 


Baruch Awerbuch 


Awerbuch has been working on designing efficient and reliable distributed protocols, with 
emphasis on issues related to dynamic networks. 


He put a great deal of effort into development of an efficient compiler for dynamic network 
protocols. In [25] he used techniques of amortized analysis to improve the best known 
compiler for asynchronous protocols. Together with Sipser [32], he introduced a new concept 
of dynamic synchronizer which allows us to apply static synchronous protocols in a dynamic 
asynchronous network. This protocol is very fast, requiring O(1) time overhead, thus showing 
that dynamic asynchronous networks are as fast as static synchronous ones. Finally, working 
with Afek and Moriel (Tel-Aviv) {3}, he discovered a compiler whose overheads depend 
exclusively on the overheads of the origir ' protocol. 


Awerbuch also worked on many specific problems in dynamic networks. Together with 
Shavit and Mansour [31], Awerbuch discovered the first polynomial solution to the end-to- 
end communication problem. This is one of the basic network problems; it was conjectured in 
[4] that it has no polynomial solution. Together with Goldberg (Stanford), Luby (ICSI) and 
Plotkin (Stanford) he found a new technique {28] for removing randomness from distributed 
computing that has yielded fast deterministic algorithms for Maximal Independent Set, A+1 
Coloring and Breadth First Search. Together with Kutten (IBM Yorktown) and Cidon 
(IBM Yorktown), he discovered an efficient algorithm for maintaining a tree in a dynamic 
network. Together with Goldreich and Herzberg (Technion) [29], he developed a quantitative 
framework for analyzing performance of broadcast protocols in dynamic networks. 
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Another area of Awerbuch’s research has been in distributed graph algorithms. Together with 
Bar-Noy (Stanford University), Linial (IBM Almaden Research Center), and Peleg (Staniord 
University) [27], he discovered new routing schemes that use only bounded space, have low 
communication overhead, can be constructed online, work for weighted graphs, and do not 
require changes in node identities. He discovered a new efficient BFS and Shortest Paths 
algorithm [26], which is efficient both in time and in communication. This algorithm has 
an interesting recursive structure. Together with Goldreich (The Technion), Peleg (Stanford 
University), and Vainish (The Technion) [30], Awerbuch studied performance of broadcast 
protocols in point-to-point networks. 


Peter Elias 


The paper on the zero-error capacity of a binary channel under jamming using list decoding, 
which was accepted for publication at the time of the last annual progress report, has since 
appeared [98]. Its appearance led to correspondence with Körner, who has been working 
on related topics with Marton and Simonyi. Their work arose from a paper on hashing by 
Fredman and Komlos [114]. They published one paper [185] and submitted two more, which 
include new results relevant to zero-error capacity under list decoding. 


The second paper mentioned in the last progress report, which does not have to do with 
zero-error capacity but with error-correcting codes under list decoding, has appeared as a 
technical report and has been submitted for publication [99]. 


Current work explores iterative coding schemes. These schemes generate codes which differ 
from typical error-correcting block codes in that they are not guaranteed to correct all sets 
of less than k errors out of n for some integers k,n but only most such sets. Only codes with 
this property can be used to communicate at rates near channel capacity: as discussed in 
98; and {99}, the capacity of a channel subject to a jammer who can alter any k symbols 
out of n is significantly less than that of a channel in which bits are subject to statistically 
independent errors with probability k/n. 


The first analysis of these codes appeared in [97]. It showed that they could be used to 
transmit without error at a positive rate, by using check symbols to correct each row of 
transmitted symbols, rows of check symbols to correct each column in a two dimensional 
array, layers of check symbols to check preceding layers in a rectangular solid, and so on. The 
fraction of the symbols used for checking is less than 1 in the limit if the sizes of successive 
dimensions increase, e.g., in a geometric series. 


In 97) each order of check symbols is used only once and then discarded. That sufficed 
to show that communication at a positive rate is possible, but the proof gives a rate sub- 
stantially below channel capacity. The rates of iterated codes come much closer to capacity 
when lower order check bits are used to make further corrections after each use of higher 
order check bits, and the process is continued until a stable state is reached. Since statis- 
tical independence disappears after such recycling, getting tight bounds on the amount of 
improvement is difficult. Both analysis and simulation are being used to explore this domain. 
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Shafi Goldwasser 


Goldwasser’s work focused on designing efficient digital signature schemes and on designing 
multi-party secure cryptographic protocols. 


Beaver and Goldwasser [36] designed a protocol for n processors, a majority of which can be 
faulty, to compute any polynomial time function defined on the processors private inputs. 
The function is computed preserving privacy. Namely, no coalition of faulty processors 
can discover more about non-faulty processors inputs than implies by the function value. 
Moreover, the faulty processors can find out the function value “if and only if” the non- 
faulty processors find out the function value, in a strong probabilistic sense. This is the 
first solution in the case where the faults constitute more than a majority of the network 
processors. 


Ben-Or, Kilian, Goldwasser, and Wigderson [43] designed two extremely efficient user iden- 
tification methods (using no modular multiplications and based on the difficulty of the NP- 
complete subset-sum problem). These schemes work in the two prover interactive proof 
model introduced by the same authors in ’88. Namely, the prover (e.g., Bank card holder) 
is split into two agents, and the verifier (e.g., the Bank teller machine) guarantees that the 
two agents can not transfer information to each other during the identification process. 


Bellare and Goldwasser [37] introduced new paradigms for digital signatures and message 
authentications which are a complete departure from the digital signatures schemes based on 
Diffie-Hellman trapdoor function model or the recent digital signature scheme of Naor- Yung. 
The new scheme is based on the use of random functions and noninteractive zero-knowledge 
proofs. 


Goldwasser has also been developing a monograph of lecture notes in cryptography, an 
outgrowth of her lectures in the MIT cryptography and cryptanalysis class. Goldwasser 
chaired the CRYPTO-88 conference held in Santa Barbara in August 1988. She was a 
member of the STOC 1989 conference committee, and together with Rivest, wrote a survey 
article on cryptography for the handbook on computer science. 


Tom Leighton 


Together, Leighton and his students made solid progress on packet routing algorithms, fault 
tolerance in networks, and on graph embedding problems. At this point they are getting 
close to asymptotically optimal results that also appear to work well in reality. In fact, the 
highlight of the coming summer and fall will be to help design and layout a multibutterfly 
network for Tom Knight’s new machine. With a little luck, theory will be able to play an 
important role in the development of a state of the art machine. They are also working with 
Bill Dally and his students to see if theory can be helpful with the routing protocols on his 
new machine, and have been talking with Alan Baratz about the possibilities of implementing 
some of the new theory routing algorithms on the GF11 so that it can become a general 
purpose routing machine. 
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Another highlight of the coming year will be the new ACM Symposium on Parailel Algo- 
rithms and Architectures that Leighton has been helping to organize. The first meeting will 
be in Santa Fe in mid-June, and there should be a large contingent from MIT at the meeting. 
Papers to be presented range from theory to practice, and the meeting should provide a good 
forum for interaction between people who think about parallel machines, those who buila 
them, and those who use them. The 1990 meeting is in Crete, so now would be a good time 
to start thinking about submitting a paper! 


Maggs, Rao, Koch, and Newman are students getting their Ph.D.’s this year. 


In addition, Leighton is continuing work on his book on parallel computation. He expects 
to have Volume I done by early next year. 


Charles E. Leiserson 


Leiserson returned in January from a leave of absence at Thinking Machines Corporation, 
where he worked on the design of a parallel computer. He was an invited speaker at the 
25" Anniversary Symposium for Project MAC at MIT, and at the Decennial Caltech VLSI 
Conference. He served on the program committee for the IEEE Foundations of Computer 
Science Conference. He also served on the first program committee for the ACM Symposium 
on Parallel Algorithms and Architectures. 


Leiserson has spent much of his time in the past year working on a textbook entitled In- 
troduction to Algorithms, coauthored with Cormen and Rivest. The textbook attempts to 
provide a rigorous, but elementary, introduction to the area of analysis of algorithms. It will 
be published jointly by MIT Press and McGraw-Hill later this year. 


Two of Leiserson’s Ph.D. students completed their degrees in the past year. Plotkin’s thesis is 
entitled Graph- Theoretic Techniques for Parallel, Distributed, and Sequential Computation. 
Plotkin assumed a postdoctoral position at Stanford and will be an assistant professor in 
the fall. Bielloch’s thesis is entitled Scan Primitives and Parallel Vector Models. Bleiloch 
accepted an assistant professorship at Carnegie-Mellon University. 


Three students completed their Master’s degrees under the supervision of Leiserson. Ishii’s 
thesis, A Digital Model for Level-Clocked Circuitry, Park’s thesis, Notes on Searching in 
Multidimensional Monotone Arrays: and Fried’s thesis, VESI Processor Design for Commu- 


nication Networks. 


Leiserson has also been supervising Cormen, Greenberg, Kipnis, Maggs, Phillips, and Pa- 
paefthymion. 


Nancy A. Lynch 


Please see entry under the chapter on Theory of Distributed Systems. 
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Albert R. Meyer 


Meyer’s research has focused on semantics and logic of programming languages. During the 
past year, he worked on the following: 


Research Topics 


e Semantics of Concurrency: Meyer, with Bloom and Istrail (Wesleyan), question the 
foundations of Hoare’s CSP and Milner’s CCS theories of concurrency [51][53][52]. 
They propose a new notion of process equivalence and show it lies strictly between 
that of CSP and CCS. See the report of Bard Bloom for more complete discussion. 


e Semantics of Terminating Evaluation: Research with Bloom, Riecke, and Cosmadakis 
(IBM Watson Research Center) on the general connection between operational and 
denotational semantics, focusing on repairing the mismatch between semantics in which 
expressions M and Ar.M mean the same thing, even though evaluation of M diverges 
but evaluation of Az.M terminates immediately, cf. [233][85][54]. See the report of 
Riecke. 


e Dataflow Semantics: See the report of Rudich. 
e Theory of Sequential Functions: See the report of Jim. 


e Type-checking for Records with Inheritance: See the report of Jategoankar. 


Professional Activities 


e Chairman, MIT Project MAC 25" Anniversary Celebration, October 1988. 


e Conference Chairman, IEEE Symposium on Logic in Computer Science (LICS), Seat- 
tle, WA, May 1989. 


e Moderator for three Computer Science research email forums on (1) types, (2) concur- 
rency, and (3) logic. 


e Member, Program Committee, International Symposium on Logic at Botik, Pereslavl- 
Zalessky, USSR, July 1989; “Kleene ’90” Logic Symposium, Chaika, Bulgaria, June 
1990. 


e Thesis Supervision: 


PhD Bloom, expected September 1989. 


SM Riecke, completed January 1989 [273]. 
Jategoankar, expected September 1989 [171]. 
Jim, expected January 1990. 
Rudich, expected January 1990. 
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SB Ernst, completed May 1989 [100]. 
Lent, expected May 1990. 
Siegel, expected September 1989. 


e Editorial Activity: 


Editor-in-Chief, Information and Computation; Managing Editor, Annals of Pure and 
Applied Logic; Editorial Board Member, SIAM Journal of Computing, Journal of Com- 
puter and System Sciences, Theoretical Computer Science, and Advances in Applied 
Mathematics, Advisory Editor, Handbook of Logic in Computer Science and Handbook 
of Theoretical Computer Science; Co-editor, Proceedings of Logic at Botik [235!: MIT 
Press Foundations of Computing Series Co-editor; MIT Press Editorial Board Member. 


Silvio Micali 


Micali’s work focused on cryptography and zero-knowledge proofs. In particular, the follow- 
ing results were obtained: 


1. Goldreich, Micali, and Wigderson previously proved that all theorems in NP possess a 
zero-knowledge proof. Extending that work, [41] showed what can be efficiently verified 
can be proven in zero knowledge. 


2. [237] constructed a very efficient “password” scheme. The person seeking identification 
is required to perform the equivalent of two multiplication modulo on an integer that 
is hard to factor. These special “passwords” are hard to compromise both by someone 
simply listening to the identification process and by the password verifier herself. 


Ronald L. Rivest 


Rivest’s work focuses on the theoretical aspects of machine learning. 


Rivest is continuing to work with Schapire on problems related to the inference of finite 
automata. Their motivation has been the “artificial intelligence” problem faced by a robot 
placed in an unfamiliar environment with no a priori knowledge of its world. The goal of 
the robot is to learn the structure of its environment thrcugh systematic experimentation. 


Schapire and Rivest [274] have been developing an interesting extension to Angluin’s finite 
automaton inference procedure |15]. The new algorithm can infer an automaton even when 
no “reset” is available (i.e., there is no means of bringing the automaton back to the start 
state), and can be used for inferring automata using either the global state-space representa- 
tion or the diversity-based representation previously developed by Rivest and Schapire. The 
algorithm has been implemented and seems quite efficient in practice. 


jogether witt Goldman and Schapire, Rivest studied the problem ot “learning a binary 
relation” :130:. In this problem. the entries of a matrix representing a binary relation are 
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repeatedly probed. Before each probe, the “learner” must predict the value of the matrix 
entry about to be probed. The goal of the learner is to make as few prediction errors as 
possible. In order to model the natural “structure” that may be present in many binary 
relations, such structure being what gives the learner the leverage needed to make fewer 
than the maximum possible number of prediction errors, it is assumed that there are only 
a small number k of different row types. Algorithms are developed and analyzed that make 
a small number of errors in this case, and some interesting lower bounds (based on the 
existence of projective geometries) are proved. 


Goldman and Rivest [129] also worked on the problem of efficiently implementing the “halv- 
ing algorithm.” The halving algorithm applies to situations (like the relation-learning prob- 
lem of the last paragraph) where the learner must predict the classification of each instance 
before being told the true classification, and where the learner’s goal is to minimize the 
number of prediction errors made. The halving algorithm (due to Barzdin and Freivalds 
[35], and refined by Littlestone [215]) predicts in according to the majority of the hypotheses 
consistent with all previous data; when a prediction error is made it therefore reduces by hal! 
the number of consistent hypotheses remaining. Based on a proposal by Warmuth, Goldman 
and Rivest have investigated the use of approximate counting scheme in order to implement 
approximations to the halving algorithm. This idea can be made to work out, and can be 
applied to problems such as learning a total order. (This problem is then rather like the 
problem of sorting, where an adversary gets to pick which elements are to be compared next, 
and where you must predict the outcome before each comparison is made.) 


Sloan has finished up his Ph.D. under Rivest’s supervision [290]. His thesis explores a number 
of fascinating issues and topics in machine learning theory, such as the effect of noisy data on 
learnability, techniques for learning a complicated concept reliably and usefully by learning 
it “gate by gate” (subconcept by subconcept), and methods for combining classical Bayesian 
inference with computational complexity considerations. 


Linial, Mansour, and Rivest extended and presented their work showing that a finite Vapnik- 
Chervonenkis dimension is not a limitation for learning a concept class, if the size of the data 
sample used for learning can be adjusted dynamically as learning proceeds [209]. Intuitively. 
an algorithm can dynamically request more data when it discovers that the concept being 
learned is “complex.” 


Blum finished up his Master’s thesis 155] under Rivest’s supervision, and the work he and 
Rivest have done or the complexity of training even very simple neural networks was pre- 


sented at NIPS 157). The basic result is that training a three-neuron neural network is 
NP-complete. 


Under Rivest’s supervision, Perugini has experimentally examined the effect of training set 
data size on the efficacy of the “back-propagation” training algorithm for neural nets 259]. 
The results were not crisp, but some interesting pathologies were uncovered. 


Together with Cormen and Leiserson, Rivest worked on a introductory text on algorithms 
[84]. This text should be suitable for both introductory undergraduate and introductory 
graduate students; it should be out later this year. 
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David B. Shmoys 


Shmoys studied a wide range of questions in the design and analysis of efficient algorithms. 
He continued his work in the design and analysis of approximation algorithms, as well as in 
the design of parallel algorithms for graph problems. 


One of the most important algorithms used in the solution of the traveling salesman problem 
is a procedure due to Held and Karp [157] that produces an extremely tight lower bound on 
the value of the optimal solution. With Williamson [287], Shmoys considered this procedure 
as an approximation algorithm for the value of the optimal TSP solution. First, they showed 
that the algorithm has an important monotonicity property, in the sense that the bound 
delivered for a subset of the input is no more than for the entire input. This property makes 
it possible to prove that the procedure delivers a value at least 2/3 of the optimal value. 
Unlike Christofides’ algorithm, which is the best known approximation algorithm for the 
problem (and guarantees identical performance), this bound is not known to be tight. 


One major area of Shmoys’ research is in the area of the theory of scheduling. Together 
with Lawler (UC/Berkeley), Lenstra (CWI) and Rinnooy Kan (Erasmus) [197], he wrote a 
survey article of the field. This survey was written as part of the preparation for a book on 
this subject by these authors. 


With Hall (Sloan School/MIT) [143][142], Shmoys has been considering a variety of approx- 
imation algorithms for scheduling problems. In particular, he has been studying the effect 
of precedence constraints and related timing constraints on the possibility of obtaining good 
approximate solutions. Hall and Shmoys [143] consider the problem of scheduling n jobs on 
a single machine, where each job 7 has a specified release date r; before which is cannot be 
processed, a time p; that specifies the amount of (continuous) processing required, and a 
deadline d;. (For technical reasons, the deadlines are non-positive.) If the lateness of a job is 
the difference between the time that a job completes processing and its deadline, the aim is 
to find a schedule that minimizes the total lateness. For the variant of the problem without 
precedence constraints, a polynomial approximation scheme is obtained. For the problem 
with precedence constraints, they give an algorithm that delivers a solution that finishes 
within a factor of 4/3 the optin.. | time (improving on the previous best algorithm that only 
came within a factor of 2). This represents an interesting breakthrough of a “factor of 2” 
barrier that is prevalent in approximation algorithms for precedence constrained scheduling 
problems. Also with Hall [142], Shmoys considers the natural generalization of the previous 
work to the case when there are parallel identical machines to do the processing. For this 
problem without precedence constraints, a polynomial approximation scheme was obtained. 
With precedence constraints, an algorithm that delivers a solution at most a factor of 2 more 
the optimal was obtained. 


In the area of parallel graph algorithms, together with Goldberg (Stanford), Plotkin (Stan- 
ford), and Tardos ;124], Shmoys considers the question of parallel algorithms for bipartite 
matching. By using techniques developed for general purpose sequential algorithms for linear 
programming, so-called interior-point methods, they obtain an algorithm that requires only 
O*(ym) steps on a polynomial number of processors, where m denotes the number of edges 
in the graph, and O* indicates that lower order polylogarithmic factors have been ignored. 
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Michael Sipser 


Sipser is continuing his work on lower bounds in complexity theory and the structure of 
complexity classes. 


One of the important achievements of the past year was the construction of an oracle col- 
lapsing the probabilistic time hierarchy done jointly with Fortnow [112]. Time hierarchies 
for deterministic and nondeterministic computation are among the earliest results proved in 
complexity theory. They show that if one is allowed a little more time then one can solve a 
larger class of problems. Oddly, this has never been established for probabilistic computa- 
tion. It is possible that any problem solvable in probabilistic polynomial time can also be 
solved in probabilistic linear time, surprising though this would be. Our result shows why 
this problem has remained open. The existence of our oracle indicates that the techniques of 
recursive function theory, which solved the previous cases, are insufficient to solve this case. 
Fortnow is receiving his Ph.D. this year under Sipser’s guidance. 


Sipser also considered some problems in the theory of distributed computing. Together with 
Awerbuch, he gave a method which facilitates the design of network protocols [32]. Using 
this method, one can first design a protocol to run on a static, synchronous network and 
then automatically convert it to run on a dynamic, asynchronous network. The former 
network model is 2 simpler one on which to conceive designs, whereas the latter model is 
more realistic. 


Together with Boppana, Sipse- prepared a definitive survey on lower bounds on the circuit 
complexity of boolean functions [60]. This will appear in the forthcoming Handbook of 
Theoretical Computer Science. sipser was selected to be the principal speaker at an American 
Mathematical Society conference on circuit complexity. He will prepare a monograph of these 
lectures to be included in the AMS CBMS series. 


Eva Tardos 


Tardos has been working on combinatorial optimization problems. Together with Goldberg 
and Plotkin from Stanford and Shmoys from MIT [124], she developed an O*(,/m) time 
algorithm, where n and m denotes the number of nodes and edges of the input graph and an 
algorithm is said to run in O*(f(n)) time if it runs in O( f(r) l>:;*(n)) time for some constant 
k. In this paper, interior-point methods for linear programming, developed in the context of 
sequential computation, are used to obtair a parallel algorithm for the bipartite matching 
problem. The results extend to the weighted bipartite matching problem and to the zero- 
one minimutn-cost flow problem, yielding O*( \/m log C) algorithms, where it is assumed that 
the weights are integers in the range [~C ...C] and C > 1. These results improve previous 
bounds on these problems and introduce interior-point methods to the context of parallel 
algorithm lesign. 


In a joint paper with Plotkin from Stanford |2661, Tardos gave an improved dual network 


simplex algorithm. A simplified version of Orlin’s [248] strongly polynomial minimum-cost 
flow algorithm is developed, and it is shown how to convert it to a dual network simplex. 
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The pivoting strategy leads to an O(m? log n) bound on the number of pivots, which is better 


by a factor of m compared to the previously best pivoting strategy due to Orlin [247]. Here 
n and m denotes the number of nodes and arcs in the input network. 


In a joint paper with Frank from Budapest and Nishizeki, Saito, and Suzuki from Tokyo, 
Tardos developed simple efficient algorithms for the routing problems around a rectangle. 
These algorithz.s find a routing with two or three layers for two-terminal nets specified on the 
sides of a rectangle. The mininim area routing problem is also solved. All algorithms run in 
linear time. The minimum area routing problem was previously considered by LaPaugh and 
Gonzalez and Lee. The algorithms they developed run time O(n?) and O(n), respectively. 
The simple linear time algorithm is based on a theorem of Okamura and Seymour, ana on a 
data structure developed by Suzuki, Ishiguro, and Nishizeki. 


Tardos has also written two surveys this year. A general survey on complexity theory for The 
Handbook of Combinatorics [286] jointly with Shmoys from MIT, and a survey on the recent 
development in the theory « network flows [125] jointly with Goldberg trom Stanford and 
Tarjan from Princeton. 


13.2 Student, Research Associate, and Visitor Reports 


Javed A. Aslam 


Aslam has been working with Rivest on algorithms for machine learning. Specifically, he 
has been studying the radial mapping problem where a device must infer the shape of its 
surroundings by rotating in place and taking distance measurements. Relevant cases studied 
have included these where angular positioning error and distance measurement error are 
present in varying degrees. Aslam recertly began work on the inference of Markov chains, 
and plans to continue this work with Rivest over the summer. 


Mihir Bellare 


Basic cryptographic primitives such as zero-knowledge proofs and oblivious transfer have 
classically relied on interaction between the parties involved. A part of Bellare’s work has 
focused on a new public key model in which such interaction can be removed. 


Bellare and Micali [38] proposed a method via which a collection of users may first establish 
m | y 

public keys and then be able to accomplish oblivious transfer without interaction. Using 

earlier work of 236,, this yields noninteractive methods for zero-knowledge proofs. 


Bellare and Goldwasser i37, demonstrated the wide applicability of such noninteractive zero- 
knowledge proofs by using them to get simple and efficient schemes for digital signatures 
and message authentication, A feature of this work was an implementation of noninterea: tive 
zero-knowledge proofs which could be checked by any user in the system rather than 5y a 


angle recipient. 
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In further work related to the role of interaction in zero knowledge proofs, Bellare, Micali and 
Ostrovsky [39] showed that the languages of graph isomorphism and quadratic residuosity 
have constant round perfect zero. knowledge interactive proofs, They also provided a general 
mechanism to collapse rounds in a statistical zero-knowledge proof while preserving, the 


statistical zero knowledge, given some standard eryptographie assumption. 


Bonnie Berger 


Berger has been working on removing randomness from parallel and sequential algerithins. 
This involves coming up with a randomized algorithin for a problem, if one does not exist, 


and devising or using known techniques to remove this randomness. 


Berger began this work at Bell habs last sumer when, with Shor, she devised a randomized 
sequential algorithm for the acyclic subgraph problem (the dual of the feedback are set 
problem) and used known, highly sequential techniques to convert it to a deterministic 
one, thereby achieving tight bounds deterministically for the problem [48]. This work also 
included an RNC algorithm for the problem which, by applying techniques explored in her 


subsequent work, Berger is attempting to convert to a deterministic one. 
Berger’s subsequent work centered around removing randomness from parallel algorithms. 


Berger and Rompel [46][44] developed a general framework for removing randomness from 
randomised NC algorithins whose analysis uses only polylogarithmic independence. Previ 
ously, no techniques were known to determinize those RNC algorithms depending. on more 
than constant tndependence, One appheation of their techniques is an NC algorithm for the 
set discrepancy problem, which can be used to obtain many other NC algorithms, including 
a better NC edge coloring algorithm, As another application of their techniques, they pro 
vided an NC algorithm for the hypergraph coloring problem. ‘This work has been chosen for 
the FOCS 89 Machtey Award. 


Berger, Rompel, and Shor [47] gave NCO approximation algorithms for the unweighted and 
weighted set cover problems, Their algorithms use a linear number of processors and give a 
cover that has at most log a times the optimal size/weight, thus matching the performance 
of the best sequential alportthias. Previously, there were no known parallel algorithms for 
the general set cover problem. Berger, Rompel, and Shor devised a randomized algorithm, 
depending on only pairwise independence, and then converted if to a deterministic one, 
The difficult part here was coming up with the randomized algorithm. Furthermore, they 
applied therm set cover alporıthm to learning theory, giving an NC algorithm to learn the 
concept class obtained by taking the closure under finite union or finite interseetion of any 
concept class of finite VC dimension whieh has an NC hypothesis finder. In addition, they 
gave a dinear processor NC alponthin far a variant of the set cover problem first proposed 
hy Chazelle and Fredman, and used it to obtain NC algorithms for several problems in 


computational peametry 
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Bard Bloom 


Bloom, working with Meyer and Istrail (Wesleyan) is studying the denotational semantics 
of parallel and nondeterministic processes. Dana Scott’s very successful models for the se- 
mantics of sequential, deterministic programs do not extend naturally to the more general 
domain. There are a number of proposals for a replacement; Meyer and Bloom are investi- 
gating several of these models. One central question in semantics is, “when shall we consider 
two programs equivalent?” Two proposed notions are trace congruence (used in Hoare’s 
language CSP and variants) and bisimulation (used in Milner’s SCCS). Bloom, Meyer, and 
[strail have found an extension of SCCS in which the two notions coincide. The new opera- 
tion is somewhat peculiar in nature; they have shown that no finite set of operators defined 
in a clean way can cause the two to coincide. Similarly, bistmulation cannot be understood 
as equivalence with respect to any set of reasonable experiments. It can be understood in a 
probabilistic setting; however, the translation from the usual setting to the probabilistic one 


is not effective, 


This work lead to a notion of "ready simulation” which seems to have the same sorts of 
formal properties as bisirantation (various alternate definitions, complete axtomatizations 
and polynomial time decision procedures for finite processes, and so forth), but can also be 
understood as congruence with respect to a fairly reasonable language. 


A classie paper in denotational sermanties (Gordon Plotkin’s LCF Considered as a Program- 
ming Language) gives two kinds of semantics for a simple but extremely powerful language 
based on typed lambda calculus. One semantics is operational, describing how a particular 
interpreter computes; the other kind is denotational, assigning meaning to the programs in 
moderately familjar mathematical terms, using several varieties of Scott domains. The paper 
shows that the two semantics coincide in a weak sense (computational adequacy; two integer 
terms evaluate to the same constant if they have the same denotational meaning), but not 
in a stronger sense (full abstraction: two routines behave identically in all contexts if they 
have the same denotational meaning). The programming language can be extended by the 
addition of a “parallel conditional’ such that the extended language is fully abstract for one 
of the denotattonal models. The classic paper shows that this extension is not fully abstract. 


for the other languages, 


However, one of the other denotational models (Scott domains built from complete lattices 
recher than cpo’s)is mathematically appealing, and it is somewhat surprising that the classic 
paper did not find a fully abstract extension of LCF using this model. However, this is not 
the authors oversight. Bloom has shown that there is no fully abstract extension of LCF 
with a reasonable evaluator for which this model is fully abstract, where “reasonable” means 
that an arithmetic expression can evaluate to at most one value. If the evaluator is not 
ceqnired to be reasonable in this sense, there is a simple extension of LCF after the spint of 
the cheste paper which is fully abstract for the lattice model. Ifthe evaluator és allowed to 
havea techmeally peenliar property, it can be made fully abstract for virtually any model 
ofthe typed Fambea caleulu-. 

Bloor and ecke have been snwestigatingg siuular questions for the so-called “hfted Scott 


domain: © Onbimary functional tanguages exhibit some behavior on higher-order terse f 
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a program evaluates to a function, it stops and prints “function”—even if the function will 
always diverge when applied to any argument. In ordinary Scott domains, there is no 
semantic difference between the function which always diverges given any argument, and a 
divergent computation of functional type. Lifted domains repair this deficiency. Bloom and 
Riecke have achieved a close correspondence between operational and denotational semantics 
for this setting, and are investigating axiom systems. 


Avrim Blum 


Blum has been working in two main areas this past year and has also finished his Master’s 
thesis [55] under Rivest’s supervision. 


He continued his work with Rivest on problems in computational learning theory—in partic- 
ular, computational complexity issues in the training of neural networks. One result of this 
work is a proof that training a very simple neural network with only three computational 
nodes is NP-complete. This work was presented at the NIPS and COLT conferences [57!. 


Blum has also been working on approximate graph coloring. The 3-coloring problem is one 
of the most well known NP-complete problems, but there is an enormous gap between the 
results achieved by the best approximation algorithms for this problem and the best lower 
bounds known. Blum devised a new approximation algorithm [56] that reduced this gap 
somewhat and introduced different techniques for attacking this problem. 


Thomas H. Cormen 


Cormen continued his work on the textbook Introduction to Algorithms with Leiserson and 
Rivest. He planus io start working on parallel computing research over the summer. 


Lenore Cowen 


Cowen continues work with Goldwasser on two areas: key exchange protocols and information 
theoretic properties of private functions. 


Claude Crépeau 


Crepeau’s current research interest is mainly the study of two-party cryptographic pro- 
tocols. His earlier study of disclosure protocols [63] [64] evolved in a series of results 
(86)/89)'88''1771/87] essentially stating that very complex two-party protocols known as fair 
oblivious circuit evaluation (see [87] for definition) can be achieved from very simple devices. 
Such a device can be a simple noisy Sa for instance. Another such possible device fol- 
lows the lines of Bennett and Brassard and rely on the correctness of quantum physics. This 
work was accomplished in part while Crepeau was visiting Aarhus University (Denmark) in 
the summer. 


Crepeau is currently completing his Ph.D. thesis, that will cover some recent material selected 
from the above papers. He is expected to defend his thesis during the summer. 
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Zero-knowledge protocols is another of Crepeau’s favorite research topics. While visiting 
IBM Almaden Research Center last summer, he contributed two papers on this subject 
165)/62]. These two papers are followup to [61], in the fact that they are concerned with a 
model where the prover involved in the protocol is computationally bounded. 


Aditi Dhagat 


During fall 1988, Dhagat was a teaching assistant for the graduate course in theory of com- 
putation taught by Sipser. During the year, she worked with Sipser in complexity theory 
and cryptography, trying to construct a pseudorandom number generator secure against 
monotone circuits without any unproven assumptions. In the process, they looked at mono- 
tone statistical tests and showed that there exist exponential size monotone statistical tests 
which break the security of the Nisan-Wigderson generator based on parity. They have also 
shown that if there exist monotone functions which are hard to approximate for polynomial 
size monotone circuits, then there exist pseudorandom number generators secure against 
polynomial size monotone circuits. 


Dhagat plans to continue to work on this question during the summer of 1989. 


Michael Ernst 


Ernst became a graduate student at MIT in January 1989. He worked under Meyer’s su- 
pervision to prove a monotone model adequate for recursive program schemes. In order 
to prove adequacy, most proofs in the literature directly use a stronger continuous model 
which simplifies the proof and which implies the weaker result; the typical approach is via 
Tait’s method of computability [265][296]. The introduction of continuity is poorly moti- 
vated from an expository and pedagogical viewpoint; we would hope to be able to show the 
result directly [233]. 


Ernst and Meyer [100] found that this was not possible; although they were able to produce 
a clear exposition of the concept, at one crucial point continuity was required. While the 
result holds for the monotone model without mention of continuity, a weaker assumption of 
monotonicity in the proof leads to a failure of the result. 


Ernst spent much of 1989 finishing up his undergraduate requirements; he plans to get 
started on his SM thesis during the upcoming year. 


Lance J. Fortnow 


Working with Sipser, Fortnow examined the relationship between probabilistic polynomial 
time and probabilistic linear tire. They showed [112; the existence of an oracle under which 
the two classes are identical. This result means the techniques of separating the deterministic 
and nondeterministie time hierarchies will not work for probabilistic computation. They also 
show many other results relating to probabilistic computation and linear time. 


During the spring of 1989, Fortnow spent the semester writing his thesis [111| and looking 
for a job, 
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Jeff Fried 


Fried continued research on the architecture, design, and analysis of communication networks 
for use in parallel computers and telecommunications. He completed a Master’s thesis [117], 
supervised by Leiserson, which includes two switch designs for such networks (120]/116]. 
Followup work in this area has included an improved circuit design for one of the VLSI 
functions used in these designs [118], and a study of some of the modularity tradeoffs found 
in sparse circuit-switched interconnection networks [119]. 


Fried is currently working on a number of problems related to the architecture and control 
algorithms needed for high performance communication networks. This work includes a 
study of the impact of synchrony on the performance of distributed algorithms, and design 
studies of a VLSI packet router for use in broadband networks [115]. 


Sally A. Goldman 


Goldman has been working with Rivest on studying learning algorithms for concepts that 
have polynomial sized instance spaces [129][130]. They have focused on polynomial prediction 
algorithms in which the learner predicts a value for each entry in the instance space and 
then receives feedback as to whether the prediction was correct. They consider the worst 
case mistake bounds under several models for the selection of the instances. Often, good 
mistake bounds are obtained by the halving algorithm. They discuss an approximate halving 
algorithm and show how a fully polynomial randomized approximation schemes can be used 
to implement (with high probability) the approximate halving algorithm. They demonstrate 
these techniques on the problem of learning a total order on a set of n elements. 


Goldman has also been working with Rivest and Schapire on the particular problem of 
learning a binary relation between n objects of one kind and m of another [130]. This can 
be viewed as the problem of learning an n x m binary matrix. Here, the instance space 
contains the elements of the matrix and is thus of polynomial size. They present numerous 
upper and lower bounds on the number of mistakes that prediction algorithms can make 
under different models for the selection of the instances. 


Goldman has also done some research in the field of computational geometry. In particular, 
she developed an algorithm to compute the greedy triangulation of an arbitrary point set 
that takes O(n* Ign) time and O(n) space [128]. In January, Goldman participated in the 
robot building project lead by Schapire. 


Ronald I. Greenberg 


Greenberg worked on three main topics during the past year: networks for general pur- 
pose parallel computation, multi-layer channel routing, and bounds on the area for VLSI 
implementations of finite-state machines. 


Recent work on networks for general purpose parallel computation is reported in [136]. 
This paper provides several extensions and generalizations of earlier work on the problem of 
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designing “universal” networks which can simulate any other network of comparable physical 
size with only polylogarithmic overhead in simulation time. 


On the topic of multi-layer channel routing, Greenberg has been seeking improvements upon 
algorithms recently developed with Ishii and Sangiovanni-Vincentelli (University of Califor- 
nia, Berkeley) for the program MULCH [137]. The basic approach of MULCH is to divide 
a multi-layer problem into essentially independent subproblems of one, two, or three lay- 
ers. A main step in MULCH is to greedily partition the nets once a set of layer groups has 
been determined. As each net is considered, it is assigned to the group where the resulting 
subproblem seems to be the one requiring the least channel width. For testing the required 
channel width of single-layer partitions, Greenberg and Miller Maley (Princeton University) 
have devised algorithms which are more efficient than naive approaches involving compete 
routing of the layer. Greenberg is also developing “incremental” algorithms to quickly de- 
termine the effect on certain subproblem characteristics when a new net is added, by taking 
advantage of knowledge derived from earlier computations on the subproblem. 


Finally, Greenberg and Mike Foster (Columbia U. and NSF) have derived lower bounds on 
the area required for VLSI layout of finite-state machines [113]. These lower bounds show 
that naive layout approaches are optimal in the worst case. 


Michelangelo Grigni 


Grigni is a third year graduate student supervised by Sipser. His thesis research considers the 
construction of fast robust broadcasting networks, continuing work begun with Peleg |139} 
of the Weizmann Institute. Current work with Bertsimas of the Sloan School extends their 
recent result [50! on the suboptimality of the space-filling curve heuristic for the Euclidean 
TSP problem. Other work with Bertsimas includes a survey of various #NP-complete prob- 
lems. Grigni continues searching for new attacks on the matrix multiplication exponent 
problem. 


Carolyn M. Haibt 


Haibt spent most of the year on coursework, but also continued work with Tardos. They 
are currently working on algorithms for the generalized network flow problem. This is a 
generalization the maximum flow problem, where each edge has an associated gain factor, 
and flow is multiplied by this factor when it passes through an edge. 


Mark D. Hansen 


Hansen has been studying graph embeddings with applications to parallel processing prob- 
lems. In 153. he examines the problem of finding optimal geumetric embeddings in the 
plane and higher dimensional spaces. Given an undirected graph G with n vertices. and a 
et P ofn pons in RË, the grosmetrie embedding problem consists of finding a bijection from 
the vertices of (7 to the points in the plane which minimizes the sum total of edge leng:hs of 
the embedded graph. In generat. this problem is .V/'complete as it contains the Fur tidear 
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traveling salesman problem as a special case. Hansen gives approximation algorithms for 
embedding many of the important graphs studied in the theory of parallel computation. He 
presents fast algorithms for embedding d-dimensional grids in the plane which are within 
a factor of O(log n) times optimal cost for d > 2 and O(log? n) for d = 2. He also shows 
that any embedding of a hypercube, butterfly, or shuffle exchange graph must be within an 
O(log n) factor of optimal cost. When the points of P are randomly distributed or arranged 
in a grid, he is able to use the results of Leighton and Rao [202] to give a polynomial time 
algorithm which can embed arbitrary weighted graphs in these points with cost within an 
O(log” n) factor of optimal. 


Hansen shows how the algorithms developed in [153] for geometric ember'dings can be used 
to give solutions which are within an O(log’ N) factor of optimal to problems of performance 
optimization for array-based paralle) processors in the following areas: communication load 
balancing, dynamic allocation of jobs to processors, reconfiguring around faults, and sim- 
ulating other architectures. He also indicates some applications to wafer scale integration 
problems and the dynamic configuration of distributed computing networks. 


Working with Leighton, Hansen was able to apply some of the techniques developed in [153] 
to give an NOW) time algorithm for solving the Euclidean traveling salesman problem. The 
previous best running time for this algorithm was O(log N27). A year earlier Smith {293} 
independently gave an algorithm with the same running time, using different techniques 
involving the Lipton-Tarjan planar separator theorem [211]. Hansen and Leighton are cur- 
rently investigating the possibility of developing practical heuristics for solving Euclidean 
TSP using the ideas in these two algorithms. 


Alexander T. Ishii 


Ishii completed his Master’s thesis [168], which describes his models for VLSI timing analysis. 
The model maps continuous data domains, such as voltage, into discrete, or digital, data 
domains, while retaining a continuous notion of time. The majority of the thesis concentrates 
on developing lemmas and theorems that can serve as a set of “axioms” when analyzing 
algorithms based on the model. Key axioms include the fact that circuits in our model 
generate only well defined digital signals, and the fact that components in our model support 
and accurately handle the “undefined” values that electrical signals must take on when they 
make a transition between valid logic levels. In order to facilitate proofs for circuit properties, 
the class of computational predicates is defined. A circuit property can be proved by simply 
casting the property as a computational predicate. 


Ishi has also been working with Greenberg and Sangiovanni-Vincentelli of Berkeley on a 
multi-layer channel router ter VLSI circuits, celled MULCH [137]. While based on the 
CHAMELEON system developed at Ber: cley, MULCH incorporates the additional feature 
that nets may be routed entirely on a single interconnect layer (CHAMELEON requires the 
vertical and horizontal sections of a net be routed on different interconnect layers). When 
used on sample problems, MULCH shows significant improvements over CHAMELEON in 


area, total wire jength. and via count. 
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Ishii continued work, begun with Maggs, on a new VLSI design for a high speed multiport 
register file. Design goals include short cycle time and single-cycle register window context 
changes. This research began as an advanced VLSI class project, under the supervision of 
Knight of the MIT Artificial Intelligence Laboratory. 


Lalita A. Jategaonkar 


Jategaonkar has been working jointly with Meyer on further developing research begun last 
year at Bell Laboratories with Mitchell. In [170], Jategaonkar and Mitchell develop an 
extension of the programming language ML in which a restricted object-oriented style can 
be achieved. In keeping with the framework of ML, a type derivation system and a ‘ype 
inference algorithm is presented. It is proved that the algorithm is sound and con:plete 
with respect to the type derivation system, and that it infers a most general typing of every 
typeable expression in the language. This research will comprise Jategaonkar’s forthcoming 
Master’s thesis. 


In order to show that the type derivation system is “reasonable” in a precise, technical 
sense, Jategaonkar and Meyer have been developing an interpreter for this language. They 
aim to show that the interpreter satisfies certain desirable properties, and that the interpreter 
and the type derivation are well matched in the sense that no typeable expression in the 
language reduces to a type error. Jategaonkar is also interested in further extending ML to 
support subtyping of abstract types and recursive types. Another direction of research she 
is interested in pursuing is to develop a semantics for these extensions of ML. 


Trevor Jim 


Jim entered the department in September 1988. His previous work with Appel [16] on a 
novel code generator for the language ML was presented at POPL ’89 in January. 


Under the direction of Meyer, he has been studying the work of Berry and Curien [90}/49] 
on models of PCF [265] based on stable functions and sequential algorithms. These models 
were developed as alternatives to the standard model, which contains troublesome “non- 


sequential” elements. Jim is trying to find extensions of PCF for which the alternate models 
are fully abstract. 


Joe Kilian 


Kilian spent most of his time working on his thesis, “Randomness in Algorithms and Proto- 
cols” ‘176, which he recently completed. He also did some work in efficient zero-knowledge 
interactive proofs, bounded interaction zero-knowledge proofs, noninteractive zero-knowledge 
proofs, multi-prover zero-knowledge proofs, space-bounded secure protocols, communication 
lower bounds for secret sharing, and IP v.s. AM. 


A tronbling issue in theoretical eryptegraphy is the chasm between what is efficient in theory 


and what is efhcient in practice. One area in which this gap is particularly large is in zero- 
knowledge proof. for NP predicates. Suppose one wishes to prove in zero-knowledg: this: 
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some circuit, C'(2,,...,2,), is satisfiable. The previously most efficient solutions to this 
problem ([61!/167]) required the prover and the verifier to send O(k|C|) bits back and forth 
per iteration of the protocol. Here, |C! denotes the number of gates in the circuit C, and k 
denotes the security parameter. Using pseudorandom generators, Kilian [177] has exhibited a 
p.otocol in which the prover and the verifier communicate only O(|C| +k?) bits per iteration 
of the protocol. In real life circumstances, |C] is likely to be very large, in which case this 
protocol should behave better in practice as well as in theory. 


Zero-knowledge proofs typically require a great deal of interaction between the prover and 
the verifier. It is of both theoretical and practical interest to see how much interaction is 
truly needed, which led to the notions of bounded interactive protocols and noninteractive 
protocols with a common random string. In bounded interaction protocols, the prover and 
the verifier interact for time polynomial in the security parameter. After the interaction 
phase, the prover proves theorems to the verifier by sending him a letter in the mail. In a 
noninteractive protocol with a common random string, the prover and verifier do not interact 
at all, but are both presented with a uniformly distributed string of length polynomial in 
the security parameter. 


Prior to Kilian’s work, there existed three proposed protocols for these scenarios, due to 
Blum-Feldman- Micali [58], De Santis-Persiano-Micali [93], and Micali-Ostrovsky [250]. 


Kilian developed a very simple and efficient protocol for bounded interaction zero-knowledge 
proofs, and a provably secure protocol for noninteractive zero-knowledge with a common 
random string. Both of these protocols’ security is based on reasonable cryptographic as- 
sumptions. His protocol for bounded interaction zero-knowledge proofs is more communica- 
tion efficient than the best previously known interactive zero-knowledge protocols. In both 
of these protocols, the prover can prove polynomially many polynomial-sized theorems. 


In [42], Kilian along with Ben-Or, Goldwasser, and Wigderson, developed a multiprover 
generalization of interactive proof systems. They showed that, informally, anything two 
provers could prove, they could prove in statistical zero-knowledge. Recently, Kilian has 
strengthened this result, showing that anything two provers could prove, they could prove 
in perfect zero-knowledge. 


With Nisan, Kilian applied knowledge complexity notions from cryptography to space- 
bounded automata (179!. They developed protocols in this scenario for a number of cryp- 
tographic protocols: secret key exchange, bit-commital, secure circuit evaluation, and zero- 
knowledge proofs. In the space-bounded scenario, the security of these protocols may be 
proven without any assumptions whatsoever. Furthermore, these protocols are robust against 
adversaries who have asymptotically more space than used by the good players. 


Nisan and Kilian also investigated upper and lower bounds for secret sharing. They consider 
schemes in which a bit bis shared among n players, such that, 


l. A majority of the n players can reconstruct 6; and 


175 


Theory of Computation 


2. A nonmajority of the players cannot reconstruct any information about b. 


They show a lower bound of O(n log n) cn the total number of bits that must be distributed 
amongst the n players. They also consider a weakened form of secret sharing, in which 2n/3 
players can reconstruct 6, and n/3 players learn nothing. They use coding theory to prove 
the existence of secret sharing schemes that are more efficient than the lower bounds proven 
for the more stringent conditions. 


A classic theorem of Goldwasser and Sipser [135] states that IP=AM. In other words, public 
coins are as powerful as private coins for interactive proof systems. Kilian found a very 
simple proof of this fact, using random selection techniques from [133]. This proof will be 
included in a paper on random selection, with Oded Goldreich, Johan Hastad, and Yishay 
Mansour. 


Shlomo Kipnis 


Kipnis has been investigating parallel architectures and interconnection networks. He is 
trying to further explore the power of bussed interconnection schemes in routing permuta- 
tions and realizing various communication patterns. Bussed interconnection schemes and 
their relation to difference covers was explored by Kilian, Kipnis, and Leiserson in [178]. In 
addition, he is investigating various arbitration schemes for bussed based architectures. 


Recently, he studied the problem of range queries in computational geometry. Range queries 
is a fundamental problem in computational geometry with applications to computer graphics 
and database retrieval systems. He compiled a survey report on three different methods for 
range queries in computational geometry [180]. 


Richard R. Koch 


Koch’s Ph.D. thesis [183] is a probabilistic analysis of routing on a parallel architecture. 
Koch analyzes the bandwidth of the butterfly network. In a dilated butterfly network, nodes 
arc connected by parallel edges instead of just one edge as in the usual butterfly network. 
He proves a previous conjecture that the expected bandwidth of an N node dilated butterfly 
network is O(N (log N)~*), where q is the number of parallel edges. He explores some 
implications of his results for design tradeoffs. He also develops interesting techniques for 
finding asymptotics for nonlinear systems of recurrences and many of the results appeared 
in L821, 
In [184] Koch, Leighton, Maggs, Rao, and Rosenberg study the problem of emulating Tg 
steps of an Ng-node guest network on an Ny-node host network. Although many isolated 
emulation results have been proved for specific networks in the past, and measures such as 
dilation and congestion were known to be important, the field has lacked a model within 
which general results and meaningful lower bounds can be proved. They attempt to provide 
such a model, along with corresponding general techniques and specific results in this paper. 
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Dina Kravets 


Kravets spent most of the year working with Aggarwal and Park on problems in compu- 
tational geometry. In January, she finished her Master’s thesis [191] which included the 
following results: 


1. An algorithm to find all the farthest neighbors of every vertex on a convex n-gon in 


O(n) tine. 


2. An O(n?) algorithm to sort the distances of all the vertices of a convex n-gon with 
respect to each vertex of the convex n-gon. 


3. An O(kn log k) time algorithm to find k farthest vertices for every vertex of a convex 
n-gon. 


4. A worst-case optimal algorithm to sort a set of numbers given lowez bounds on the 
ranks. 


The first of these algorithms appeared in the Information Processing Letters |12]. Park and 
Kravets are planning to improve the third result and submit it to the ACM-SIAM Symposium 
on Discrete Algorithms. 


Kravets is also looking at some problems in parallel computation and VLSI with Leighton. 


Leonid A. Levin 


The topic of Levin’s research in 1988-89 may be called “Randomness in Computing.” In 
[206], Levin and Venkatesan propose the first intractability results for random instances 
of NP problems. NP-complete problems should be hard on some (maybe extremely rare) 
instances. Generic instances of many such problems proved to be easy. This paper shows 
the intractability of random instances of a graph coloring problem. Applications of average 
case intractability are considered in two other papers: [132][166]. 


Blum and Micali [59] discovered permutations f with “hard-core” predicates b(z) that can- 
not be efficiently guessed from f(x) with a noticeable correlation. Both 6, f are easy to 
compute. Yao (314! modifies any one-way permutation f into f* which has a hard-core 
predicate. Its security may be lower than any constant power of the security of f and is too 
small for practical applications. Goldreich and Levin [132] prove that most linear predicates 
are hard-cores for every one-way function and have almost the same security. The result 
extends to multiple (up to the logarithm of security) hidden bits and has wide applicability 
to pseudorandomness, cryptography, etc. 


Let an easily computable function f be one-way, i.e., for most z one cannot recover from f(z) 


either (1) r by a polynomial time algorithm, or (2) an 2’ € f~1(f(z)) by a polynomial size 
circuit. In case (1), to exclude useless f(x) = 0, the difference between Shannon entropies of 
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inputs and outputs of f is restricted to O(1). Impagliazzo, Levin, and Luby [166] show, based 
on [132], that the existence of one-way functions in the sense (1) and (2) is necessary and 
sufficient for the existence of pseudo-random generators secure against feasible algorithms 
or circuits, respectively. 


In [205], Levin compares probability distributions of computational objects. The usual distri- 
butions are concentrated on strings that differ little in any fundamental characteristic, except 
their informational size (Kolmogorov complexity). This property distinguishes a class of ho- 
mogeneous probability measures suggesting various applications. In particular, it explains 
why the average case NP-completeness results are so measure independent, and offers their 
generalization to this wider and more invariant class of measures. It also demonstrates a 
sharp difference between pseudo-random strings and the objects known before. 


Bruce Maggs 


Maggs is studying the ability of a host network to emulate a possibly larger guest network 
'184'. His collaborators in this research are Koch, Tom Leighton, Rao, and Rosenberg. An 
emulation is work-preserving if the work (processor-time product) performed by the host is 
at most a constant factor larger than the work performed by the guest. Such an emulation is 
efficient because it achieves optimal speedup over a sequential emulation of the guest. Many 
work- preserving emulations for particular networks have been discovered. For example, the 
N-node butterfly can emulate an N log N node shuffle-exchange graph and vice versa. On 
the other hand, a work-preserving emulation may not be possible unless the guest graph is 
much larger than the host. For example, a linear array cannot perform a work-preserving 
emulation of a butterfly unless ihe butterfly is exponentially larger than array. These positive 
and negative results provide a basis for comparing the relative power of different networks. 


Maggs is also studying algorithms for routing packets on faulty bounded-degree networks. 
With Leighton, he developed a scheme for routing N packets on an N-node multibutterfly 
network [303] in O(log N) steps even in the presence of many faulty nodes. 


Yishay Mansour 


Yishay Mansour has continued studying data transmission in communication networks. In 
a work with Schieber (229), they show lower bounds for communication over non-FIFO 
links. In a work with Herzberg and Goldreich [131] they give a randomized protocol for 
communication over non-FIFO links. In a work with Awerbuch and Shavit [31] they show 


how to achieve polynomial end-to-end communication. 


In werk with Linial and Nisan [208}, they investigate constant depth circuit using the Fourier 
Transform. They are able to show a quasi-polynomial time algorithm for learning this class. 
Another work that is connected to learning is [40!. 


In a work with Schieber and Tiwari [231], they continue to develop techniques to prove lower 
sound for integer computations. The work with Schieber and Tiwari [230] tries to explore 
the complexity of approximating algebraic functions. In this work, techniques taken from 
\pproximation Theory are used to derive lower and upper bound. 
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Mark J. Newman 


Newman continued work on fault-tolerant strategies for parallel computation. With Hastad 
and Leighton [156], he demonstrated algorithms for reconfiguring hypercubes with faulty 
components. After reconfiguration, the hypercubes retain all computational power (within 
constant factors). The algorithms are successfi.l with high probability, given that nodes 
and edges fail independently and with constant probability. They also showed how to route 
permutations on hypercubes even if a constant fraction of the cube’s components have failed. 


With Leighton, Ranade, and Schwabe [201], Newman also showed how a dynamically chang- 
ing binary tree can be embedded in a hypercube so that computational and communication 
overhead are low. Specifically, they produced randomized algorithms which embed any grow- 
ing and shrinking binary tree so that the resulting simulation requires only constant factor 
overhead, with high probability. 


Noam Nisan 


Nisan arrived as a postdoc in the theory group in January 1989. He has been working mainly 
on problems related to complexity theory. 


Together with Babai and Szegedy [33], he proved lower bounds for the multiparty com- 
munication complexity of certain simple functions. These bounds were used to obtain a 
pseudorandom generator for Logspace without relying on any unproven assumptions. 


In [245], Nisan obtained a ful! characterization of the parallel time needed to compute any 
boolean function on a CREW PRAM in terms of the function’s decision tree complexity. 


In joint work with Linial [210], the question of obtaining approximate versions of ihe 
inclusion-exclusion formula is tackled. Tight upper and lower bounds are proved for sev- 
eral formulations of this question. 


Nisan and Kilian [179] considered cryptographic protocols in the setting where all parties 
are space-bound. In this setting, they design secure protocols for a wide spectrum of crypto- 
graphic problems. The security of these protocols is proved without relying on any unproven 
assumptions. 


In his joint work with Linial and Mansour [208], constant depth circuits are studied in terms 
of their Furier transform. It is shown that almost all of the power spectrum of a function in 
AC? lies in the low coefficients. This fact is used to obtain a learning algorithm for constant 
depth circnits, as well as several others results. 


Marios C. Papaefthymiou 


Papaefthymiou began his cadies as a graduate student at MIT in September 1988. He is 
working on his SM thesis under the supervision of Leiserson. 
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His research focuses on the design of efficient algorithms for pipelining of combinational 
circuitry. A general framework for this problem has been given by Leiserson and Saxe [204]. 


Papaefthyıniou has given an O(E) optimal algorithm for minimum latency pipelining of 
combinational circuitry with constrained clock period. He also investigates methods for 
pipelining combinational circuitry usung minimum number cf registers. 


James K. Park 


James K. Park spent most of the last year collaborating with Aggarwal (IBM, Yorktown 
Heights) and Kravets on a number of problems relating to totally monotone arrays. Perk’s 
work with Aggarwal (described in [13][14], and another manuscript “Perallel Searching in 


’ currently in preparation) centers on the proble:n of 


Multidimensional Monotone Arrays,’ 
finding maximum entries in totally monotone arrays and applications of efficient sequential 
and parallel algorithms for this problem to problems in computational geometry, dynamic 
programming, string matching, and VLSI river routing. This work generalizes and extends 
the results of [11]. (Park’s Master’s thesis [254], finished in January, is also on this sub- 
ject.) Park’s work with Kravets (described in Kravets’ Master’s thesis [191]) considers two 
more comparison problems—sorting and computing order statistics—in the context of totally 


monotone arrays and applications of efficient solutions to these problems. 


In the coming year, Park plans to continue his research relating te totally monotone arrays 
and computational geometry. 


Cynthia A. Phillips 


Phillips developed an O(lg’ n)-time (n + e)/ lg n-processor deterministic parallel algorithm 
to contract general n-node, e-edge graphs to a single node. This algorithm is used as a 
subroutine in an algorithm developed with Leiserson to contract n-node bounded-degree 
graphs in O(lgn + lg? y) time with high probability where y is the maximum „enus of 
any connected component. A deterministic version runs in time O(Ignlg*n + lg? y). The 
algorithm for bounded-degree graphs uses n/ Ign processors [262]. The contraction algorithra 
can be used to solve the connected-components, biconnected-components, and spanning-tree 
problems. 


Phillips, with Zenios of the University of Pennsylvania, completed a preliminary experi- 
mental study of the solution of large assignment problems on the Connection Machine (TM) 
multiprocessor. The assignment problem is also known as maximum-weight bipartite match- 
ing. They developed heuristics to improve sequential “tail” behavior which seems to limit 
the usefulness of many current parallel algorithms for the assignment problem and related 
flow problems (263). 


Phillips will be writing her thesis this summer. Among the new research that will probably 
be included is an analysis of the permutation distribution of the Benes network. In other 
words, how many distinct ways can the switches of a Benes network be set to yield a given 
permutation? If the permutations are well distributed, then pseudorandomly setting the 
switches of a Benes network max yield a good pseudorandom permutation network. 
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Satish B. Rao 


In {184} Koch, Leighton, Maggs, Rao and Rosenberg study the problem of emulating Tg 
steps of an Ng-node guest network on an Ny-node host network. Although many isolated 
emulation resuits have been proved for specific networks in the past, and measures such as 
dilation and congestion were known to be important, the field has lacked a model within 
which general results and meaningful lower bounds can be proved. They attempt to provide 
such a model, along with corresponding general techniques and specific results in this paper. 


Leighton and Rao have developed an approximate min-cut max-flow theorem for a type of 


multicommodity flow problem. This theorem yields an approximation algorithm for finding 
a separator in arbitrary graphs that costs at most a O(log? n) times the optimal. They also 
used the theorem to show that any permutation can be routed on an arbitrary network so 
that the congestion of any edge and path length of any message is within a O(log n) factor 
of optimal. In joint work with Maggs, they explore the problem of scheduling messages on 
paths with given congestion and length so that the routing time is minimized. 


Jon G. Riecke 


Riecke continues to work in the area of semantics and logic of programming languages, with 
two primary interests: the semantics of continuations, and the theory of “lazy” (call-by- 
name) functional languages. Working jointly with Meyer, he investigated some seemingly 
known—but undocumented-—problems in the theory of continuations. More specifically, 
Meyer and Riecke showed that either programming with continuations explicitly or using 
special “continuation-accessing” operators (e.g., Scheme’s call/cc) leads one to conclude 
different facts about code; old equivalences between programs may no longer hold in a 
setting with continuations. The implications of these results and their precise statements 
are reported in [234] and in Riecke’s SM thesis [273]. 


The theory of lazy larguages, begun by Abramsky and Ong, has also become a foc is of 
Riecke’s work. Lazy functional languages pass arguments by name (that is, arguments are 
not evaluated betore passing), but nevertheless stop evaluating higher-order expressions— 
functions - when they can build a closure. The usual Scott-style semantics do not predict this 
termination behavior correctly: a divergent functional and a closure that alw. , » diverges have 
the same meaning. Bloom and Riecke [54] developed a model for a typed lazy language that 
accurately reflects the behavior of the interpreter. Cosmadakis and Riecke (in a forthcoming 
paper) used the inodel to develop principles for reasoning about lazy programs, and proved 
that equalities between terms in a fragment of the language are decidable. 


In the past year, Riecke has also become interested in intuitionistic logic and type theory. 
and its applications to the theory of programming ıanguages. He will continue his reading, 
as well as pursuing previous lines of research. 


Phillip Rogaway 


Rogaway is a third year graduate student working under Micali. He has been working on 
cryptography and complexity theory. 
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Rogaway’s Master’s thesis evolved into the CRYPTO-88 paper which easily won the award 
for most coauthors [41]. This paper establishes that an injective one-way function suffice: to 
prove all of IP in computational zero-knowledge. It also shows that the “envelope model” 
for bit commitment suffices to show all of IP has perfect zero-knowledge proofs. 


Rogaway investigated generalized notions of knowledge complexity, e.g., protocols that re- 
lease a “small” (but nonzero) amount of information. Recently he has been working on 
reducing the interaction required for secure distributed computation. 


Jciin Rompel 


In January, Rompel compieted his Master’s thesis (277! based on approximation algorit us 
for graph coloring developed last year with Berger (45). 

More recently, Rompel has heen working on problems in the field of parallel algoiitums. 
Rompel, together with Berger, i46] developed a general framework for removing randomness 
from randomized NC algorithms whose analysis uses only polylogarithmic independence. 
Previously no techniques were known to determinize those RNC algorithms depending on 
more than constant independence. One application of their techniques is an NC algorithm 
for the set discrepancy problem, which can be used to obtain many other NC algorithms, 
including a better NC edge coloring algorithm. As another application of their techniques, 


they provided an NC algoritiun for a hypergraph coloring problem. 


Rompel, working with Berger and Shor /47;, gave NC approximation algorithms for the 
unweighted and weighted set cover problems. Their algorithms use a linear number of pro- 
cessors and give a cover that has at most logn times the optimal size/weight, thus matching 
the performance of the best sequential algorithms. Previously, there were no known parallel 
algorithms for the general set cover problem. Berger, Rompel and Shor devised a rando:wized 
algorithm, depending on only pairwise independence, and then converted it to a dete: min- 
istic one. Furthermore, they applied their set cover algorithm to learning theory, giving an 
NC algorithm to learn the coucept class obtained by taking the closure under finite union or 
nite inter oction of any concept class of finite VC-dimension which has an NC hypothesis 
tinder. In addition, they gave a linear-processor NC algorithm for a variart of the set «>. 
problem first proposed by Chazelle and Friedinan, and used it to obtain NC algorithms for 
several problems in computational geometry. 


Arie Rudich 


Rudich Fegan his first year us a sraduate student at MIT in September 1988. He is working 
on an SM thesis supervised by Meyer on dataflow theory which should be complete Ly 
January 1990. 


His research aims to generalize cecent results by Rabinovich and Trakhtenbrot [269] ai? bs 
Lynch and Stark (238), which precisely delimit the classes of dataflow networks for wach 
Kahn’s “Least Fixed Point Frinciple” [173] applies, showing that Kahn’s Paincipis fa d> 
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precisely where Brock-Ackerman-like anomalies [68| begin. Rabinovich and Trakhtenbrot 
established this boundary without distinguishing completed and incompleted output streams. 
Rudich aims to show that the results carry over to the more conventional model where the 
completed/incompleted distinction is maintained. 


Robert E. Schapire 


Schapire continued to work with Rivest on the problem of inferring an unknown finite-state 
automaton from its input/output behavior. In [274], they introduce a powerful new tech- 
nique, based on the inference of homing sequences, for solving this problera in the abseace 
of a means of resetting the machine to a start state. Their inference procedures experiment 
with the unknown machine, and from time to time require a teacher to supply counterexam- 
ples to incorrect conjectures about the structure of the unknown automaton. In this setting, 
they describe a learning algorithm that, with probability 1 — 6, outputs a correct descrip- 
tion of the unknown machine in time polynomial in the automaton’s size, the length of the 
longest counterexamp!”, and log(1/5). They present an analogous algorithm that makes use 
of a diversity-based representation of the finite-state system. Their algorithms are the first 
that are provably effective for these problems, in the absence of a “rest.” They also present 
probabilistic algorithms for permutation automata which do not require a teacher to supply 
counterexamples. For inferring a permutation automaton of diversity D, they improve the 
best previous time bound by roughly a factor of D3/log D. 


In January, Schapire led a team participating in the robot building contest of the AI Lab’s 
“Winter Olympics.” The goal of their project was to build a robot capable of performing 
some simple learning task. In particular, the robot they built, named S’bot (for Smart- 
bot or Spotbot), was able to learn from experience how to avoid running into walls and 
other obstacles. Their team consisted of Amsterdam, Blum, Goldman, Moore, Rivest, and 
Schapire. 


Schapire has also been working with Goldman and Rivest on the problem of inferring a 
binary relation [130] between n objects of one kind and m of another. This can be viewed 
as the problem of inferring an n x m binary matrix. Their goal has been to minimize the 
number of prediction mistakes made by a learner presented with such a matrix one entry at 
atime. They have been abie to prove numerous upper and lower mistake bounds for several 
variations of this problem. 


Finally, Schapire has been looking at problems relevant to the distribution-free (“pac”) 
learning mode! intioduced by Valiant 1304]. In [281], Schapire considers the problem of 
improving the accuracy of a hypothesis output by a learning algorithm. He shows that 
a model of learnability, called weak learnability, in which the learner is only required to 
perform slightly better than guessing, is as strong as a model in which the learner’s error 
can be made arbitrarily small. His resul. may have significant applications as a tool for 
efficien:ly converting a mediocre learning algorithm into one that performs extremely well. 
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Leonard Schulman 


Schulman spent most of his time on coursework this year. In the spring he developed an 
algorithm for sorting n elemerts on an n-node ring of processors in the optimal time n/2. 
This requires only constant capacity at each node in the word model. Mansour proved a 
closely related lower bound and these two results have been combined in a joint paper to be 
submitted shortly. 


During the summer of 1989, Schulman intends to read under the guidance of Sipser. 


Eric J. Schwabe 


Schwabe has been working on problems involving the efficient implementation of dynamic 
structures on fixed-connection networks. In particular, he worked with Leighton, Newman, 
and Ranade (Berkeley) on the problem of dynamically embedding binary trees in butterfly 
and hypercube networks j201!. Randomized embedding algorithms were found for both 
retworks which simultaneously optimize load (the maximum number of tree nodes mapped 
to a processor) and dilation (the maximum distance in the network between adjacent tree 
nodes) for trees which are a logarithmic factor larger than the host network. An improved 
algorithm for the hypercube was found which optimizes load and dilation for arbitrary binary 
trees, while also keeping congestion (the number of times a hypercube edge is ‘traced over’ by 
an embedded tree edge) low. Also, iower bounds were proved which show that deterministic 
algorithms cannot simultaneously optimize load and dilation. 


Schwabe has also been studying the relative strengths of the butterfly and shuffle-exchange 
graphs as interconnection networks. He proved that normal hypercube algorithms (those 
which use only one dimension of hypercube edges at a time, and adjacent dimensions in 
consecutive time steps) can be simulated on a butterfly network with only a constant »low- 
down, a result which was previously known only for the shuffle-exchange graph. A version 
of this result is being prepared for journal submission. In addition, he recently discovered a 
one-to-one embedding of the butterfly into the shuflle-exchange graph with constant dilation 


Oly log N) 


and congestion, and expansion ? , improving a result of Koch, et. al. [184]. 


Over the next year, Schwabe plans to work on relating the ideas in [201] to other problems 
in parallel memory management, and to continue his investigation of the shuffle-exchange 
vs. the butterfly. 


Alan Sherman 


Sherman (now faculty at Tnft» i niversity) has completed a monograph on the PI System for 
placement and interconnect of custom VLSI circuits [285]. The PI System was designed and 
implemented at MIT under the leadership of Rivest; Sherman was one of the key architects. 
The monograph is being published by Springer-Vcilag. Beginning September 1989, Sheiinan 
will join the faculty at the University of Maryland, Baltimore County. 
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Robert Sloan 


Sloan’s primary area of interest this past year was computational learning theory. His major 
activity for the year was preparing his doctoral dissertation [290]. Most of the other work in 
computational learning theory described here is also contained in that work. 


Much of his work was within Valiant’s model of probably approximately correct learning 
[304]. Working with Helmbold and Warmuth while visiting the University of California at 
Santa Cruz, he developed an algorithm for learning certain complex combinations of concept 
classes known to be learnable [159]. 


In [275], the problem of learning arbitrary boolean concepts in the Valiant model—by break- 
ing them into pieces and learning one piece at a time is studied. In other work, Sloan studied 
the effects of different sorts of noise on learning in the Valiant model [288]. 


He explored an alternate model of inductive inference in [276]. 


Sloan also remains interested in the subject of cryptography, and spent some time studying 
different definitions of zero-knowledge [289]. 


Clifford Stein 


Stein has been working with Shmoys on developing parallel algorithms for combinatorial 
optimization problems. Together with Klein of Harvard, he developed a parallel algorithm 
to find a maximal set of edge disjoint cycles in an undirected graph in O(log n) time using m 
processors on a CRCW PRAM. A maximal set of edge disjoint cycles is a set of cycles whose 
removal from the graph renders the graph acyclic. Stein and Klein have also been able to 
generalize this result to multi-graphs and obtain an algorithm which runs in O(log n log C) 
time, where C is the largest multiplicity of any edge [181]. 


Using this algorithm, Stein has developed an algorithm which finds a cycle cover containing 
Olm + nlogn) edges using O(log? n) time on m processors. A cycle cover is a set of cycles 
such that every edge in the graph appears in at least one cycle. 


Stein has observed that the parallel matching algorithms of [242] and [174] can be combined 
with scaling to achieve RNC algorithms for the assignment problem which use a number of 
processors independent of the size of the largest number in the problem, by slowing down 
the running time by a factor proportional to the logarithm of the size of the largest number 
in the problem. 


Stein has also been rewriting his undergraduate thesis 295} for publication. Together with 
Ahuja, Orlin, and Tarjan. he developed efficient algorithms for a wide variety of network 
flow problems in bipartite graphs. The main results are of the following form: given a 
bipartite graph with n nodes, but only n, nodes in the smaller half of the bipartition, an 
algorithm which runs in time O(f(n,m)) can be converted into an algorithm which runs in 
time O( f(ny.m) © nym). This approach leads to an algorithm for bipartite maximum flow 
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which runs in O(rnym log 2 + 2)) time, an algorithm for bipartite minimum cost circulation 

which runs in (nım logn: log(nıC)) time, and an algorithm for parametric maximum iow 
: 2 : 2 . 

which solves l bipartite maximum flow problems in O(In+ nym log( "tr + 2)) time. 


Margaret C. Tuttle 


Tuttle joined the Theory Group this year and has been working with Shmoys on approxima- 
tion algorithms for the Mixed Postman Problem: given a weighted graph G, find a least-cost 
tour of G which traverses each edge at least once. When G is totally directed or totally 
undirected, the problem can be solved iu polynomial time. When G is a mixed graph (i e., 
some edges are directed and some are undirected), the problem is NP-complete (as shown 
by Papadimitriou in 1976). 


This summer she will continue working with Shmoys. 
Joel Wein 


Wein has been working with Shmoys on parallel graph algorithms. He recently extended 
a result of Karloff’s to obtain a Las Vegas RNC algorithm for minimum weight perfect 
matching, where the weights are represented in unary. This problem was shown to be in 
RNC by Karp, Upfal, and Wigderson, but the algorithm was Monte Carlo in nature: it 
yielded a correct solution with high probability, but was unable to determine if the solution 
was indeed optimal. Wein developed a way to carry out this certification in RNC, yieiding 
a robust Las Vegas algorithm that can verify optimality. The result utilizes a structure 


theorem of Sebo for the t-join problem and yields an RNC Las Vegas algorithm for that 
problem as well. 


Over the suinmer, Wein worked at Thinking Machines Corporation, developing practical 
Connection Machine implementations for various optimization problems. He intends to 
continue working on both practical and theoretical aspects of parallel computation. 


Su-Ming Wa 
Working with Tardos, Wu has developed an O(n?) algorithm for the problem of finding two 


edge-disjoint paths in a graph 312. The basis for the algorithm is a graph-theoretic proof 
of Sevtnour (Belt Communications Research Laboratory). 
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April 1989. 


[9] C. Crépeau. Verifiable disclosure of secrets and applications. Lecture given at Euro- 
crypt, April 1989. 


[10] M. D. Ernst. Polymorphic typechecking is exponential. Lecture given at Massachusetts 
Institute of Technology, April 1989. 


[11] J. Fried. Broadband module design: cost/performance tradeoffs. Lecture given at 
International Workshop on Physical Design of Broadband Switching and Multiplexing 
Equipment, April 1989. 


[12] S. A. Goldman. Learning binary relations and total orders. Lecture given at Center 
for Intelligent Control Systems Machine Learning Workshop, May 1989. 


[13] R. I. Greenberg. MulCh: A multi-layer channel router using one, two, and three layer 
partitions. Lecture given at Massachusetts Institute of Technology, May 1989. 


114] R. I. Greenberg. Efficient multi-layer channel routing. Lecture given at Georgia 
Institute of Technology and University of Maryland, March-April 1989. 


15] R. I. Greenberg. Area-universal networks. Lecture given at Polytechnic University, 

Princeton University, and Tniversity of Southern California, February-March 1989. 
‘16; A. T. Ishii. MulCh: A multi-layer channel router using one, two, and three layer 
partitions. Lecture given at ICADD88, November 1988. 
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[17] L. Jategaonkar. MI with extended pattern matching and subtypes. Lecture given at 
New York University and IBM Research, September 1988. 


[18] J. Kilian. Theory and practice of cryptographic primitives. Lecture given at University 
of California, Berkeley, and Stanford University, April 1989. 


[19] T. Leighton. Survey talk on networks, parallel computation and VLSI design. Lecture 
given at Trento School on VLSI Computation; ICALP (July); NCUBE and University 
of Oregon (January); Dartmouth (May), 1989. 


[20] T. Leighton. Survey talk on packet routing algorithms. Lecture given at IDA SRC 
(June 1989), U. British Columbia Distinguished Lecture Series, Stanford (December 
1988), ICSI Berkeley, IBM Almaden (January 1989), MIT Center for Intelligent Con- 
trol (March 1989), DARPA Contractors Meeting; NSF Industry- University Symposium 
(April 1989), DIMACS Symposium Invited Lecture (May 1989). 


[21] T. Leighton. Dynamic tree embeddings in butterflies and hypercubes. Lecture given 
at ICSI Berkeley, January 1989. 


[22] T. Leighton. Fast computation using faulty hypercubes. Lecture given at ACM STOC, 
May 1989. 


[23] T. Leighton. Flows, paths and VLSI layout. Lecture given at Bonn Workshop on 
Flows, Paths and VLSI Layout; and AWOC, June 1988. 


[24] T. Leighton, M. Newman, A. G. Ranade, and E. Schwabe. Dynamic tree embeddings 
in butterflies and hypercubes. Lecture given at MIT VLSI Research Review, May 1989. 


[25] C. E. Leiserson. Very large scale computing. Lecture given at MIT Project MAC 25 
Anniversary Symposium, MIT LCS, October 1988. 


[26] C. E. Leiserson. VLSI theory and parallel supercomputing. Lecture given at Decen- 
nial Caltech Conference on VLSI, California Institute of Technology (March); Thinking 
Machines Corporation (April), 1989. 


[27] B. Maggs. Universal packet routing algorithms. Lecture given at IBM Thomas J. 
Watson Research Center, April 1989. 


[28] A. R. Meyer. An ultimate “Kahn Principle” for dataflow semantics. Lecture given at 
IBM Research Lab, Distinguished Lecture (January); University of Maryland (Febru- 
ary), 1989. 


[29] A. R. Meyer. Observing concurrent processes: dataflow. Lecture given at MIT, Project 
MAC 25‘* Anniversary Symposium, October 1988. 


[301 A. R. Meyer. Semantical paradigms. Lecture given at Third IEEE Symposium on Logic 
in Computer Science, Invited Lecture (July); Mitre Corporation (December), 1988. 


[31] J. Park. Notes on searching in multidimensional monotone arrays. Lecture given at 
29** Symposium on Foundations of Computer Science, October 1988. 
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[32] 


[33] 


[34] 


35] 


[36] 


37] 


[38] 


40] 


[41] 


[42] 


[43] 


J. G. Riecke. Observing termination in Scott-style semantics. Lecture given at IBM 
Research, November 1988. 


R. L. Rivest. Learning theory: what’s easy and what’s hard. Lecture given at MIT, 
October 1989. 


R. L. Rivest. Inference of finite automata using homing sequences. Lecture given at 
Boston University, March 1989. 


R. E. Schapire. Inference of finite automata using homing sequences. Lecture given at 
21% Annual Symposium on Theory of Computing, May 1989. 


R. E. Schapire. The strength of weak learnability. Lecture given at Northeastern 
University, Center for Intelligent Control Systems Machine Learning Workshop, 1989. 


R. E. Schapire. Diversity-based inference of finite automata. Lecture given at GTE 
Laboratories, American Control Conference, 1988. 


D. B. Shmoys. Jackson’s rule: making a good heuristic better. Lecture given at CWI, 
EURO/TIMS, 1988. 


D. B. Shmoys. Using linear programming in the design and analysis of approxima- 
tion algorithms. Lecture given at Princeton (DIMACS Theory Day), Stanford, and 
Columbia, 1989. 


D. B. Shmoys. Approximation schemes for constrained scheduling problems. Lecture 
given at Stanford, Oberwolfach, and Cornell, 1989. 


C. Stein. Improved algorithms for bipartite network flow. Lecture given at MIT 
Laboratory for Computer Science, April 1989. 


E. Tardos. Recent advances in the theory of network flow algorithms. Lecture given 
at Summer school on “Paths, Flows and VLSI-layout” at the Operations Research 
Institute, Bonn F.R.G., 1988. 


E. Tardos. Combinatorial algorithms for the generalized circulation problem. Lecture 


given at University of Waterloo, Mathematical Programming Symposium in Tokyo, 


Rutgers University, Stanford, Cornell University, SIAM Workshop on Optimization, 
1988. 
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14.1 Introduction 


The Theory of Distributed Systems Group has continued its work on algorithms and impos- 
sibility results for distributed problems, as well as its work on modeling, proof techniques, 
and applications. Particular highlights this year include our work on atomic registers, on 
real time systems, and on the design of a system for simulating distributed algorithms. 


14.2 Faculty Reports 


Nancy A. Lynch 


This year, Nancy Lynch worked on combinatorial results (and modeling) for asynchronous 
communication protocols. The paper [222] shows the impossibility of implementing reliable 
data link behavior in the face of certain assumptions about physical channels and about 
node failures. Besides the combinatorial results, another contribution of this paper is the 
style of problem specification. done in terms of I/O behavior of physical channels and data 
links. She has continued this work by proving a related impossibility result for “oblivious” 
non-FIFO physical channels, and by simplifying the specifications used in [222]. Both of 
these efforts are still in progress. 


Other combinatorial work this year includes some new complexity results for real time com- 
puting (see below). Lynch completed revisions of two older papers [105] [69]. Also, she 
worked on the problem of processor renaming in an asynchronous systems, mostly unsuc- 
cessfully. 


Lynch also worked on general models for concurrent systems. With Mark Tuttle, she wrote 
a short paper [226] introducing the I/O automaton model; a longer journal version is still 
planned. This model is used to redo a pair of prior results, done originally in other models 
'108)[17]; the revisions appear to be somewhat simpler than the originals. Lynch also super- 
vised Magda Nour’s SB thesis project, in which Magda established interesting connections 
between the Unity model of Chandy and Misra and the I/O automaton model. Other work 
on modeling includes work on modeling real time systems (see below). 


In addition, Lynch worked on using I/O automata to verify correctness of complicated con- 
current algorithms. This verification work includes the correctness proof [308] of the Gallager, 
et al. Minimum Spanning Tree algorithm [122], and the paper [309] on Drinking Philoso- 
phers. Lynch supervised some revisions of Russel Schaffer’s SB thesis on verifying atomic 
register protocols (280]. She also supervised Chris Colby’s SB thesis on verifying the cor- 
rectness of the Peterson-Fischer tournament-structured mutual exclusion algorithm. Finally, 
she used I/O auvimata in her consulting work at Apollo Computer, to verify the correct- 
ness of a complex algorithm for managing highly available replicated data. This proof has 
an interesting structure, based on multivalued abstraction mappings: first, a non-garbage- 
collected version of the algorithtn is proved correct, and then the “real” algorithm, using 
garbage-collection of old updates, is proved correct using abstraction mappings relating it 
to the non-garbage-collected version. 
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Many of these proofs are done in a style and at a level of detail that make them suitable 
candidates for machine verification; with John Guttag and Steve Garland, Lynch is exploring 
the possibility of using LP to perform such verification. 


Work also continued on the theory of atomic transactions, although at a slower pace than 
last year. This work is a series of papers on modeling and verifying different kinds of 
transaction-processing algorithms, culminating in a book [224] to tie them all together. This 
year, she carried out substantial revisions of [104] and [160] (not yet complete). Our papers 
on locking algorithms [104] and timestamp algorithms [18] appeared in conferences. Besides 
the revisions, our current work in progress includes results relating our theorems to those 
of the “classical theory” of database concurrency control, and results about algorithms that 
carry out concurrency control simultaneously at several different levels of data abstraction. 


Lynch began new work on a theory for real time systems (and more generally, for timing- 
based systems) as part of the ONR’s new initiative on real time computing. She gave a 
talk at the ONR Workshop on Real Time Systems in November, on “Modeling Real Time 
Systems.” This introductory talk showed how I/O automata, extended to include time as in 
[238], could be used to model the timing restrictions and requirements of real time computing 
systems. Working with Hagit Attiya, she continued to pursue some of the ideas in the talk, in 
particular, to study upper and lower bounds for combinatorial problems in real time systems. 
For example, in [23], they proved upper and lower bounds on both centralized and distributed 
versions of a timing-based variant of the mutual exclusion problem (which appeared in the 
real time literature as the “nuclear reactor problem”). Stephen Ponzio carried out related 
work (described below) on the Dining Philosophers problem. 


Our correctness proofs for the algorithms in [23] turned out to have a very interesting style, 
adapting standard proof techniques for proving safety properties (such as invariant assertions 
and abstraction mappings) for use in timing-based systems. We are currently working on 
developing these proof methods for timing-based algorithms in appropriate generality. 


Lynch gave the Keynote Address at the 1988 Symposium on Principles of Distributed Com- 
puting. The talk she gave was a survey of the many impossibility results that have been 
proved in this research field. Preparing for this talk was itself a major project of hunting 
down the results and classifying them. She has written a paper based on this talk [219], to 
appear in this year’s PODC proceedings. 


Also, Lynch put the final touches on the paper [194], which is a survey of the theory of 
distributed computing research area, and has recently written a new NSF proposal with 
Baruch Awerbuch. 


Research service activity this year included serving as editor of a special issue of IEEE 
Transactions on Computer Systems on parallel and distributed algorithms, and also as an 
editor for Information and Computation. Lynch also served on the selection committee 
for this year’s ACM Thesis Prize, and on the Program Committee for the annual ACM 
Symposium on Theory of Computing. 


to 
<> 
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Besides supervising her own research students, Lynch served as reader on thesis committees 
for Radia Perlman at MIT, and for Lisa Higham at the University of British Columbia. 


With Ken Goldman, Lynch put together a set of course notes for her class on Distributed 
Algorithms [221]. 


Plans for the near future are to continue her work on combinatorial results, especially those 
involving timing-based computation, consensus and atomic objects. She also plans to finish 
her book on atomic transactions and to continue her work on modeling concurrent systems, 
including those that use timing assumptions and randomization. She will also continue to try 
to use the I/O automaton model to describe the semantics of other frameworks and languages 
for concurrent computation, and to prove correctness of difficult concurrent algorithms. 


14.3 Research Associate and Student Reports 


Hagit Attiya 


Research Associate Hagit Attiya worked with Michael Fischer, Da-Wei Wang, and Lenore 
Zuck, of Yale University, on the Sequence Transmission Problem [22]. Here, a processor 
should transfer a sequence of data items to another processor over an asynchronous channel. 
It was shown that there is a protocol using finite-sized messages for this problem over an 
asynchronous channel that may reorder and delete messages, using finite-sized messages. 
This is in contrast with the results in [222], where it was shown that there is no bounded 
protocol for solving the related data link layer problem. 


The rest of the research done by Attiya during this period can be divided into two areas: 
timing-based problems, and wait-free coordination. 


Timing-based Problems 


This work, joint with Nancy Lynch, uses the timed I/O automata framework (introduced by 
'238', see also /218!). 


They considered a timing-based variant of the mutual exclusion problem. In this variant, 
only an upper-bound on the time it takes to release the resource is known, as opposed to 
receiving an explicit signal when the resource is released. Furthermore, the only mechanism 
to measure real time is an inaccurate clock, whose tick intervals take time between two 
known constant bounds. Upper and lower bounds on the response time of any algorithm 
solving thts problem were proven, for algorithms wherc the control is either centralized or 
distributed. 


The lower bound proofs make use of new techniques. In order to prove the correctness of 
these algorithms, Lynch and Attiya developed a way to transform any timed automaton 
into a “regular” automaton, by building timing information into the automaton state, thus 
enabling the use of classie invatiant assertion proof technique. Surprisingly, this methed can 
be extended to prove the performance of algorithms. This direction is explored in [220. 
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Wait-free Coordination 


Hagit worked with Danny Dolev (of IBM Almaden and Hebrew University) and Nir Shavit 
on wait-free solutions problems using bounded shared memory. The goal of this research 
is the development of tools that will eventually enable us to reveal the exact relationship 
between boundedness of memory and wait-freeness of operations. Two specific problems 
considered are: 


1. Randomized Consensus: All the algorithms known for this problem, and in particu- 
lar the only polynomial algorithm ({19]), use shared memory with unbounded values. 
Attiya, Dolev, and Shavit found a bounded polynomial solution to the problem [21], 
answering the open question of Abrahamson [2]. 


2. Snapshot Scan: A snapshot scan returns an “instantaneous” picture of memory. Such 
an algorithm will greatly simplify proofs of concurrent programs, and is an important 
building block in many algorithms [95][2][19]. The correctness of a bounded construc- 
tion that solves this problem is currently being proven (joint work with Michael Merritt 


and Yehuda Afek of AT&T Bell Labs). 


In joint work with Mark Tuttle [24], new proof techniques were developed to prove lower 
bounds for problems in both the shared memory and message passing models of computa- 
tion. These techniques are nontrivial generalizations of [108][217][67], and, as a result of the 
similarity of the proofs in the two models, have the advantage of exposing similarities be- 
tween the shared memory and message passing models. Using these techniques, it is possible 
to obtain a new tight lower bound for the slotted ¢-exclusion problem [20] (a similar lower 
bound was proved for the related problem of l-assignment in (70]). Furthermore, these tech- 
niques give simple proofs for known results, such as consensus [108] and processor renaming 


[20], 
Chris Colby 


Chris Colby worked on two projects. 


Ken Goldman is developing an I/O automaton distributed simulation system. It will be used 
to aid in the design and study of distributed algorithms using the I/O automaton model. 
During the summer of 1988. Uolby worked on the development of a graphical user-interface 
for the system. The interf: ve allows the user to graphically configure I/O automaton systems 
by composing automata nierarchically and specifying topology information. The interface 
will be used to observe automaton states during simulation, and may eventually be used to 
guide the particular execution path taken by the simulator. 


Colby also finished his undergraduate thesis entitled Correctness Proofs of the Peterson- 
Fischer Mutual Exzclusion Algorithms. In this thesis, The Peterson-Fischer 2-process mutual 
exclusion algorithm !260! is introduced in a slightly modified form. An invariant-assertional 
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proof of mutual exclusion is presented for the 2-process algorithm. Next, the Peterson- 
Fischer n-process mutual exclusion algorithm is introduced conceptually as a tournament 
of [Ign] 2-process competitions. A mutual-exclusion proof of the n-process algorithm is 
presented, based on a mapping between states of the n-process system and states of the 2- 
process system. This mapping delineates the correspondence between the 2-process code and 
one iteration (competition) of the n-process code. In this way, the statement of correctness 
of the 2-process algorithm is used as a lemma for the n-process proof. 


Alan Fekete 


Alan Fekete was a member of the TDS Group until September 30, 1988. He worked on the 
theory of concurrency control for nested transactions. In particular, he developed with Nancy 
Lynch a way to use the general theory given in the paper [223] to prove a sufficient condition 
for correctness that resembles the “absence of cycles” condition used in the conventional 
theory of serializability for transactions without nesting. With this condition, they gave a 
simple direct proof of the correctness of Moss’ algorithm for read/update locking. He also 
found a way to model and verify some “optimistic” timestamp-based concurrency control 
algorithms. that allow some transactions to proceed in the hope that no errors will occur, 
and cnly check that in fact nothing went wrong before commit occurs (rather than before 
each access to objects). Another area of Fekete’s work was the possibility and impossibility 
of solving certain problems concerned with building a reliable message service on top of an 
unreliable service. Results proved by Lynch, Mansour, and Fekete [222] were compared with 
these of Attiya, Fischer, Wang, and Zuck whicu used a significantly different model of a 
sys,em, in order to see in which respects the results in one model carried over into the other. 
A result prove with Lynch shows that some sequence information was essential, «ven under 
very weak constraints on the system. 


Ken Goldman 


Ken Goldman has been working on his Ph.D. thesis, Simulation of Concurrent Algorithms 
Using I/O Automata. He has heen designing a strongly typed language based on the I/O 
automaton model and a simulation system for studying algorithms expressed in that lan- 
guage. The system is intended as a research tool to aid in the design and understanding 
ot distributed algorithms. Chris Colby has been implementing a graphical interface for the 
system (see above). In (126|, Goldinan explored the possibility of distributing the simulation 
ina highly concurrent manner. while fully preserving the semantics of the model. 


As part of his area examination, Goldman studied the languages Lisp, Connection Machine 
Lisp, and Paralation Lisp in terms of their utility for writing efficient scientific programs 
on the Connection Machine. In 1127}, he reviews those languages and proposes a set of 
exiensions io Paralation Lisp for improving both the expressiveness and the efficiency of 


that language. 
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John Leo 


John Leo has been continuing work on his Master’s thesis [/O Automata Techniques and 
Ezamples, expected to be completed by August 1989. The thesis is based on two examples, 
SIFT and OLYMPIC TORCH, both involving process creation. The emphasis of the thesis 
is to den onstrate proof techniques and create new tools for correctness proofs using I/O 
automata. One such tool is a version of mappings from components of a composition to a 
single higher level specification, adapted from [227]. Other tools wil! be theorems concerning 
message passing systems and local improvements. 


Magda Nour 


Magda Nour worked on and finished her undergraduate thesis entitled An Automata- Theoretic 


Model for UNITY. 


UNITY—Unbounded Nondeterministic Iterative Transformation—is a computational model 
and a proof system to aid in the design of parallel programs developed by K. Mani Chandy 
and Jayadev Misra at the University of Texas. 


The Input/Output Automaton model is a computational model developed by Nancy Lynch 
and Mark Tutile that may be used to model concurrent and distributed systems. 


This thesis connects these two theories. Specifically, it: 


1. defines Unity Automata, a subset of I/O automata based on the UNITY computational 
model, '' UNITY program; 


to 


. defines a mapping from UNITY programs to UNITY automata; 


3. adapts the UNITY proof concepts to the I/O automaton computational model in order 
to obtain UNITY style proof rules for ./O automata; 


4. adapts UNITY composition operators to the I/O automaton model and obtains com- 
position proof rules for them; and 


on 


cons) iers various examples illustrating the above work. 


In addition, this work introduces an augmentation to the I/O automaton model which facil- 
itates reasoning about randomized algorithms, adapts UNITY concepts to it, and presents 
an example of a UNITY style high probability proof using such a model. 
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Stephen Ponzio 


Stephen Ponzio is currently studying the complexity of solutions to the “dining philosophers 
problem” in a real time system. If the maximum amount of real time that any processor 
spends in the critical region is bounded by some constant c, then a simple alternating algo- 
rithm guarantees a wailing time of 3c + O(1). He shows that in a system of n processors 
which issue requests asynchronously, no algorithm can guarantee a waiting time of less than 
2c. He also gives an algorithm that guarantees 2c + O(n) for n even. Future research will 
improve upon the known algorithms or raise the lower bound by including terms such as 
message delay time. Other problems fundamental to distributed computing will also be 
considered. 


Nir Shavit 


Nir Shavit worked with Danny Dolev of IBM ARC on the Bounded-Concurrent-Time- 
Stamping problem. Concurrent time stamping is at the heart of solutions to some of the most 
fundamental problems in distributed computing. Based on concurrent-time-stamp-systems, 
elegant and simple solutions to core problems such as fcfs-mutual-exclusion, construction 
of a multi-reader-multi-writer atomic register, probabilistic consensus, etc. were developed. 
Unfortunately, the only known implementation of a concurrent-time-stamp-system has been 
theoretically unsatisfying, since it requires unbounded size time-stamps; in other words, 
unbounded memory. Not knowing if bounded concurrent-time-stamp-systems are at all con- 
structible, researchers were led to constructing complicated problem-specific solutions to 
replace the simple unbounded ones. In this work, for the first time, a bounded implementa- 
tion of a concurrent-time-stamp-system is presented. It provides a modular unbounded-to- 
bounded transformation of the simple unbounded solutions to problems such as above. It 
allows solutions to two formerly open problems, the bounded-probabilistic-consensus prob- 
lem of Abrahamson [2], and the FIFO-l-exclusion problem of [110], and a more efficient 
construction of mrmw-atomic registers. This work |95! was presented at STOC 1989. 


Shav.t also worked with Hagit Attiya and Danny Dolev on wait-free solutions of problems 
using bounded shared memory. The goal of this research is the development of tools that 
will eventually enable us to reveal the exact relationship between unboundedness of memory 
and wait-freeness of operations. Two specific problems we considered: 


1. Randomized Consensus: All the algorithms known for this problem, and in particular 
the only polynomial algorithm (due to Aspens and Herlihy), use shared memory with 
unbounded values. A bounded polynomial solution to the problem [21], answering the 
open question of Abrahamson [2], will be presented at POCD 1989. 


J 


2. Snapshot Scan: We are interested in a scan algorithm that will return an “instanta- 
neous” picture of memory. Such an algorithm will greatly simplify proofs of concurrent 
programs, and is an inportant building block in many algorithms [95][21[19]. We are 
currently in the process of proving the correctness of a bounded construction that we 
believe solves this problem (joint work with Mike Merritt and Yehuda Afek). 
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Shavit also worked with Baruch Awerbuch and Yishay Mansour on the end-to-end problem 
in unreliable networks. This is a fundamental problem, dealing with the question of how to 
assure that two nodes in a communication network such as Bitnet, can guarantee communi- 
cation even if they are only eventually connected (see [315] ). We present the first polynomial 
solution to the problem [31], opening the possibility that with further research, a practically 
efficient solution to the problem may be found. 


Shavit is currently working with Hagit Attiya and Nancy Lynch on a line of research that 
will lead to a better understanding of the notion of wait-freeness and its relation to fault 
tolerance. We are currently attempting to verify if existing definitions really capture the 
intuitive properties attributed to wait-free primitives. 


Shavit is in the process of completing a draft of joint work with Mike Merritt and Yehuda 
Afek on the local snapshot algorithm, an algorithm that performs snapshots in distributed 
message passing systems, with time and communication complexity dependent on the fiow 
of computation of the application, rather than the size of the complete network. 


Also, Shavit is writing his Ph.D. thesis, touching on many of the research topics mentioned 
above. 


Greg Troxel 


Greg Troxel refined an algorithm he developed for detecting and recovering from process- 
execution resource deadlock in a system using remote procedure calls intended for the Fault 
Tolerant Parallel Processor being developed at the Charles Stark Draper Laboratory. He is 
constructing a proof using I/O automata that this algorithm is correct. Interesting issues in 
the proof ace the formal specification of correctness conditions of the algorithm, since this 
algorithm functions as a scheduler for the remote procedure call system, and demonstration 
of liveness properties. 


Mark Tuttle 


Mark Tuttle is primarily interested in understanding the correctness and construction of 
distributed algorithms in terms of the “knowledge” individual processors in a distributed 
system have about their environment (e.g., the local states of neighboring processors, etc.). 


His current interest is understanding cryptographic protocols and system security in terms 
of formal notions of knowledge (e.g., [145]}. In order io think about probabilistic protocols 
like these in teııns of knowledge, we have to be able to answer the question “What should it 
mean for an agent to know or believe an assertion is true with probability .99?”. Different 
papers [102][109][145] give different answers, choosing to use quite different probability spaces 
when computing the probability an agent assigns to an event. In [146], joint work with Joe 
Halpern, they show no single choice is correct in all contexts, and show for any given context 
how to make the most appropriate choice. They show that each choice can be understood 
in terms of a betting game, and that each choice corresponds to betting against a different 
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opponent. They consider three types of adversaries. The first selects the outcome of all 
nondeterministic choices in the system; the second represents the knowledge of the agent’s 
opponent (this is the key place the above-mentioned papers differ); the third is needed in 
asynchronous systems to choose the time the bet is placed. They illustrate the need for 
considering all three types of adversaries with a number of examples. Given a class of 
adversaries, they show how to assign probability spaces to agents in a way most appropriate 
for that class, where “most appropriate” is made precise in terms of this betting game. 
They conclude by showing how different assignments of probability spaces (corresponding to 
diferent opponents) yield different levels of guarantees in coordinated attack. 


In [241], it is shown how to construct extremely fast, efficient protocols for problems like 
consensus in synchronous systems, problems requiring processors to perform coordinated 
actions simultaneously. It is shown that the construction of such protocols reduces to testing 
for a state of knowledge called common knowledge. Unfortunately, these results do not 
extend to asynchronous systemis; in fact, it is known the state of common knowledge cannot 
be attained in such systems [144]. In such systems, however, the state of eventual common 
knowledge [144] appears to be very closely related to the solution of such problems, but there 
are no useful tools for proving that this state of knowledge is or is not attained, let alone for 
a processor to test for this state of knowledge. In [302], Tuttle gives a new game-theoretic 
characterization of eventual common knowledge, a characterization that may be a first step 
in developing such tools. 


In joint work with Hagit Attiya [24], new proof techniques are developed to prove lower 
bounds for problems in both the shared memory and message passing models of computa- 
tion. These techniques are nontrivial generalizations of [108][217][67], and, as a result of 
the similarity of the proofs in the two models, have the advantage of exposing similarities 
between the shared memory and message passing models. Using these techniques it is pos- 
sible to obtain easily known lower bounds on consensus [108], processor renaming [20], and 
€-exclusion [70]. 


Other work by Tuttle this year includes further exposition [226] with Nancy Lynch of the 
Input/Output Automaton model of distributed computation formalized in the course of his 
Master’s thesis work [225], and further thought [238] on the problem of adding time to the 
I/O automaton model to allow the model to be used to reason about real time systems. 
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14.4 Publications 


[1] J. Aspnes, A. Fekete, N. Lynch, M. Merritt, and W. Weihl. A theory of timestamp. 
based concurrency control for nested transactions. In Proceedings of 14'* International 
Conference on Very Large Data Bases, pages 431-444, Los Angeles, CA, August 1988. 


[2] H. Attiya, D. Dolev, and N. Shavit. Bounded polynomial randomized consensus. To ap- 
pear in Proceedings of the Eighth Annual ACM Symposium on Principles of Distributed 
Computing, 1989. 


[3] H. Attiya, M. Fischer, D. Wang, and L. Zuck. Reliable communication over an unreliable 
channel. In progress. 


[4] H. Attiya and N. Lynch. Time bounds for real time process control in the presence of 
timing uncertainty. Submitted for publication. 


[5] H. Attiya and M. Snir. Better computing on an anonymous ring. Submitted for 
publication. 


[6] H. Attiya and M. Tuttle. Bounds for slotted £-exclusion. February 1989. Unpublished 
manuscript. 
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15.1 Overview of the MIT X Consortium 


The MIT X Consortium was formed in January 1988 to further the development of the 
X Window System. The major goal of the Consortium is to promote cooperation within 
the computer industry in the creation of standard software interfaces at all layers in the 
X Window System environment. MIT’s role is to provide the vendor-neutral architectural 
and administrative leadership required to make this work. The Consortium is financially 
self-supporting, with membership open to any organization. At present, over 60 companies 
belong to the Consortium, as well as several universities. These members represent the bulk 
of the US computer industry, as well as considerable segment of the international industry. 


15.2 Current Status and Future Plans 


15.2.1 Release 3 


One of the primary tasks of the Consortium staff is the maintenance and evolution of a soft- 
ware distribution containing sample implerientations of all interfaces defined by the Consor- 
tium., as well as numerous applications and utilities. In October, Release 3 of this distribu- 
tion, consisting of 26 megabytes of source code, was made available to the world, along with 
a companion collection of roughly 80 megabytes of source code of user-contributed software. 
The distribution is available using anonymous FTP from a number of Internet sites, and on 
magnetic tape from the MIT Software Center. 


15.2.2 Configuration Management 


Configuration management is an important aspect of any large software system, and the X 
distribution is no exception. Development is performed on more than half a dozen different 
platforms, each running a different operating system, and the system is used externally on 
a variety of additional platforms. Details of how to build and install the system vary with 
every operating system (networking interfaces, program names, compile options, header and 
library files, etc.) and machine type (low level graphics code is device-specific), as well as 
with site-specific preferences (where programs should be installed, default values, etc.). 


Jim Fulton has done extensive work during the past year on configuration management 
for the X distribution. Release 3 has been widely praised for its ease of portability and 
installation, and significant iniprovements have been made since then. 


15.2.3 X Conference 


In January, we hosted the Third Annual X Technical Conference. The purpose of the confer- 
ence is to present and discuss leading edge research and development in the X environment 
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from both academia and industry. This year’s conference consisted of seven tutorials, 24) 
presentations, and 14 informal “birds of a feather” sessions, spread over three days. Donna 
Converse and Michelle Leger handled the bulk of the organizational details, including rey 
istration, scheduling, catering, conference proceedings, and video tape coordination. The 
conference was attended by approximately 1100 people, free of charge, and was well re- 
ceived. Since attendance has grown every year and we reached the seating capacity of MIT 
facilities, we expect to hold next year’s Conference off-campus. 


15.2.4 MIT Research Funding 


The X Consortium membership would like to see MIT remain the leader and focal point in 
research and development of the X Window System. To that end, the X Consortium created 
a funding pool to encourage MIT faculty, staff, and students to participate in X-related 
research and development activities. The pool for 1989 was set at $250,000. Although 
relatively few proposals have been received, two projects have been approved, and approval 
of several more is anticipated. 


15.2.5 Sample Server Implementation 


Keith Packard made several improvements to the MIT X server implementation. Using a 
prototype provided by Adam de Boor at UC Berkeley, Keith wrote a device-independent im- 
plementation of backing-store and save-unders (in which the server saves portions of windows 
that are obscured by other windows so that some exposures can be handled automatically: 
for Release 3. In addition, Keith spent considerable time and energy producing an imple- 
mentation for drawing arcs that conforms to the X protocol specification. Although the 
version that was distributed in Release 3 was rather slow, it has been sped up significantly 
In addition, we recently received some very important enhancements from Joel McCormack 
at Digital Equipment Corporation that optimize region operations and window hierarchy 
manipulations. Keith added some of his own improvements and simplifications to this code. 
and integrated it into our implementation. 


Up through Release 3, failure to allocate memory would result in server termination. Since 
catastrophic failure is not a desired response (particularly in limited memory environments 
such as X te. minals), this was of considerable concern. Bob Scheifler has since reworked the 
server to survive most memory allocation failures, reporting errors back to the requesting 
client in accordance with the X protocol specification, and continuing to operate normally. 
The only failures which are not yet handled adecuately are those occurring during region 
operations, principally during window hierarchy reconfiguration. Bob Scheifler and Keith 
Packard have developed a strategy for gracefully surviving these failures, and future imple- 
mentation work is planned. 


In addition to recovering from allocation failures, overall memory consumption in the server 
has been significantly reduced. Bob Scheifler and Keith Packard have designed new data 


X Consortium 


structures for the major server resources (windows and graphics contexts) that should cut 
memory size approximately in half. A further benefit of this work was a revised strategy for 
layering in the server (based on the usual object-oriented strategy of using wrappers around 
methods) that will considerably simplify the implementations of backing-store and software 
cursors. 


15.2.6 Standard Colormaps 


Standard Colormaps are a mechanism that allows applications to share commonly-used color 
resources, with an efficient mapping from RGB color values to pixel values for display. Keith 
Packard and Donna Converse have developed an algorithm for constructing Standard Col- 
ormaps which permits other applications (those using the normal X protocol color lookup 
facilities) to also share the same color resources. This is particularly important on typical 
color workstations today, which support only a single hardware colormap with a limited 
number of colormap entries. Keith Packard also developed a revised algorithm for creating 
Standard Colormaps that are linear ramps through the RGB cube, which now allows ap- 
plications to treat gray scale and other linear maps to be treated uniformly with all other 
Standard Colormaps. Donna Converse implemented a set of routines for creating Standard 
Colormaps for all of the visual classes defined by the X protocol. 


15.2.7 Graphics Benchmark 


Dan Schmidt, working with Jim Fulton, developed a graphics benchmark and demonstration 
program. The program provides a simple interface for exercising all of the attributes that can 
affect graphical output (such as line style, cap style, join style, fill style, dash pattern, and 
line width) in combination with the various graphical primitives (such as points, lines, arcs, 
and text). In addition to providing a means for obtaining performance data, this program 
will also be a valuable interactive demonstration of the X graphics model. 


15.2.8 Xt Intrinsics 


A major accomplishment in the past year has been standardization (within the Consortium) 
of the Xt Intrinsics, an object-oriented foundation for building user interface toolkits. Such 
toolkits (including the MIT Athena Widgets set) are available from a growing number of 


vendors, 


Work on the Intrinsics is far from finished, however. As vendors have begun to use the 
Intrinsies in earnest, a number of deficiencies (in both function and performance) have been 
identified. In particular, using a window for every user interface component (called “wid- 
gets”) caused concern over the amount of memory used in both the client (where the toolkit 
resides) and in the server. As a result, a proposal for windowless widgets (originally designed 
and implemented by Digital Equipment Corporation, now used by a number of companies) 
is currently under review within the Consortium, under the overall guidance of Ralph Swick 
of MIT Project Athena. 


218 


X Consortium 


15.2.9 Athena Widgets 


Chris Peterson has been doing considerable work fixing and enhancing the Athena Widget 
set, which is used in a growing number of our core applications. The most significant recent 
addition is a long-awaited menu widget, supporting both pulldown and popup menus. This 
widget has now replaced several incompatible menu implementations in our distribution and 
is expected to be widely used. 


15.2.10 Core Components 


There are now several product-quality widget sets built on top of the Xt Intrinsics. Although 
these widget sets all provide remarkably similar functionality, they differ considerably in their 
graphical user interface (appearance and behavior) and their programmatic interface. For 
the programmers who wish to have their applications blend in with the other applications 
on a given vendor’s platform, the ability to easily retarget a given application to more than 
one graphical user interface is crucial. Unfortunately, with present toolkits, the differences 
in programmatic interfaces makes this task quite difficult in practice. 


The Core Components effort is an attempt to specify a policy-free application programmer's 
interface, that would permit different implementations to embody disparate graphical user 
interface policies, transparent to the application programmer. Dana Laursen of Hewlett- 
Packard is the chief architect of the Core Components. Ralph Swick of MIT Project Athena 
and Bob Scheifler have contributed to the fundamental architecture, and a variety of engi- 
neers in several Consortium organizations have contributed to the design. Although consid- 
erable progress has been made, a number of very hard problems still exist, such as how to 
permit subclassing without exposing functionality that is specific to a particular graphical 
user interface. The time and resource investment required to complete the research now 
appears to be too large for many Consortium organizations, who must focus in the short 
term on shipping initial toolkit products. 


15.2.11 Inter-client Communication Conventions Manual 


The Inter-client Communication Conventions Manual (ICCCM) establishes policies covering 
a number of mechanisms in the X proiocol in order to allow applications from independent 
vendors to coexist and cooperate in the X environment. The ICCCM covers the use of the 
selection mechanism for peer-to-peer data exchange (e.g. in cut and paste operations), client 
to window manager communication (for dealing with title bars, icons, geometry, input focus, 
colormaps, etc.), client to session manager communication (for client checkpoint and window 
deletion), and client manipulation of keyboard and pointing device attributes. The overall 
architect for the ICCCM has been David Rosenthal of Sun Microsystems, with considerable 
input from engineers in various Consortium organizations and all of the MIT staf. The 
document is now out for its second public review, and a final standard will be produced 
shortly Jim Fulton designed and implemented the Xlib changes required to support the 
ICCCM, and these changes are also now out for public review. 
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15.2.12 X Display Manager 


Keith Packard produced XDM, the X Display Manager, for Release 3. This daemon manages 
a collection of X displays, including X terminals, on a given host. It provides for authenti- 
cating a user at a display (i.e., login) and running the user’s session. Although designed to 
work with both hardware displays local to the host and with remote X terminals, a control 
protocol is required to make X terminals work well. In particular, when an X terminal is 
powered on or reset, a mechanism is needed to inform the host’s XDM daemon that the 
terminal is now up, so that login can be initiated. Keith Packard and Bob Scheifler have 
designed the X Display Manager Control Protocol (XDMCP) for this purpose. The protocol 
also deals with network security issues, and permits centralized configuration management 
in an environment with a large aumber of terminals and potential login hosts. The protocol 
is currently under review within the Consortium. 


15.2.13 Security 


Security has long been a low priority issue in the X world. However, with the increase in 
commercial X products, and with the rash of computer break-ins and viruses over the past 
year, interest in security is now rather high. The default host-based access control mechanism 
in the core X protocol is simply not adequate in most environments. Keith Packard and Jim 
Fulton have designed and implemented a basic framework for allowing X clients to send 
authorization information to the X server, and Keith Packard put a simple encryption-based 
authorization scheme into the X Display Manager and the X server as a test. A more secure 
scheme is being worked on as part of the X Display Manager Control Protocol, and work is 
ongoing at MIT Project Athena to integrate Kerberos as an authorization mechanism. 


15.2.14 Compound Text 


Internationalization (or localization) of user interface software is increasingly important to X 
vendors. A key aspect of this is dealing with text in languages other than English. There are 
three important uses cf text in the X environment that are external to applications: inter- 
chent communication using selections (e.g. cut and paste); window properties (e.g. text for 
title bars); and resources (e.g. text for labels and prompts). Typically, different languages 
have different character sets, and each character set is given a particular encoding (usually 
one or two bytes per character). In some cases, the characters used for a single language are 
split across more than one character set encoding, 


Bob Scheifler developed a format for multipie character set data, called Compound Text, 
based on ISO standards for encoding and combining character sets (ISO 2022 and ISO 6429). 
Compound Text is intended to be an external representation, or interchange format, for use 
in the three areas listed above. It is not intended to be an internal representation within 
an application; it is expected {but not required) that clients will convert Compound Text 


220 


X Consortium 


to some internal representation for processing and rendering, and convert from that internal 
representation to Compound String when providing textual data to another client. The 
format supports the standard ISO 8859 character set encodings and the standard Japanese, 
Chinese, and Korean character set encodings, and encourages their use, but also allows 
non-standard encodings to be used. Horizontal direction of text can also be encoded. The 
Compound Text specification is now out for public review. 


15.2.15 X Logical Font Description Conventions 


Jim Flowers of Digital Equipment has been the chief architect of the X Logical Font Descrip- 
tion Conventions, which establish a standard parsable font name format and standard font 
properti *s, providing X clients a server-independent means to query and use a rich collection 
of fonts. For example, the conventions provide an adequate set of typographic font attributes 
for publishing and other applications to do intelligent font matching or substitution when 
handling documents; automatically place subscripts and superscripts; and determine small 
capital heights, recommended leading, and wordspace values. Bob Scheifler contributed to 
the design, and Jim Fulton worked to ensure that fonts contributed to MIT and font support 
mechanisms (such as the font compiler) conform to the conventions. The conventions are 
now out for public review prior to finalizing the -*>ndard. 


15.2.16 Font Server 


With the advent of X terminals and other limited-memory servers, and the significant in- 
crease in quality screen fonts, the idea of a font server is now attractive. A font server is a 
program for providing font information to client programs (such as X servers, or even print 
servers and document previewers). Like the X server, the font server will be able to simulta- 
neously support multiple clients of differing architectures over any virtual stream connectir? 
The font server should be able to deal with multiple font (input and output) formats, and 
provide partial font information (so that limited client memory can be used as a cache). 
The font server must also be able to enforce licensing restrictions on a per-font basis. Tom 
Porcher of Digital Equipment formulated a requirements document for font service, and Jim 
Fulton put together a preliminary design for a font server protocol. Daniel Dz -dailler of 
Bull is interested in pursuing the design and implementation to completion, and we hope for 
more progress next vear. 


15.2.17 PEX Sample Implementation 


PEX is a 3D graphics extension to the X protocol, supporting the PHIGS and PHIGS- 
graphics interfaces. In order to establish proof of concept of the PEX design, and to promote 
the use of PEX, an effort is now underway to build a sample implementation of both the PEX 
server extension and a full client library (providing the PHIGS and PHIGS+ programming 
interface). Bob Scheifler, working with several interested companies, put together a Request 
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For Proposals. A number of bids were received, and one was selected based on input from 
potential sponsors. There are now 15 sponsors of the implementation which is scheduled for 
release to the public in the early spring of 1991. 


15.2.18 Input Device Extension 


The core X protocol deals only with one keyboard and one pointing device. However, many 
workstations have an assortment of input devices which would be very useful to X applica- 
tions. George Sach» of Hewlett-Packard and Mark Patrick of Ardent Computer produced an 
X protocol extension and a C library for dealing with additional input devices. The primary 
devices supported are those with keys, buttons, and one or more axes of motion. In addition, 
the extension is designed to itself be extensible, so that new classes of input devices, and new 
combinations of classes, can easily be added. Bob Scheifler and Keith Packard contributed 
to the design, as have engineers from a number of Consortium organizations. 


15.2.19 Video Extension 


Todd Brunhoff of Tektronix has been working on an X protocol extension to provide an X 
interface to the generally interesting aspects of displaying live video in windows, capturing 
graphics from windows and converting them to a video signal, and managing the network 
of connections to and from devices that may receive or produce these signals, such as video 
tape recorders. 


15.2.20 Multi-buffering and Stereo Extension 


Jeff Friedberg and Larry Seiler of Digital Equipment, and Jeff Vroo.n of Stellar Computer, 
worked to merge several different double- buffering proposals into « single coherent X protocol 
extension for supporting multi-buffering and stereoscopic viewing of windows. The extension 
allows multiple, independently-addressabie output buffers to be associated with normal and 
stereo windows. Any of the buffers can be displayed in the window, and a series of buffers 
can be displayed in rapid succession to achieve a smooth animation. Keith Packard and Bob 
Scheifler contributed to the design, which is now out for public review. 


15.2.21 Nonrectangular Windows 


Keith Packard designed and imiplemented an X protocol extension for changing the visible 
shape of a window to an arbitrary nonrectangular shape, including forms composed of disjoint 
pieces. Each window is defined by two regions: the bounding region and the clip region. The 
bounding region is the area of the parent window which the window will occupy (including 
border). The clip region is the subset of the hounding region which is available for sub- 
windows and graphics. The area between the bounding region and the clip region is defined 
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to be the border of the window. The extension proved remarkably simple to implement in 
the server, with changes required in just a few places. Th. added functionality imposes no 
cost penalty on rectangular windows, and performance for nonrectangular windows is quite 
acceptable. The extension is now under review within the Consortium. 


15.2.22 X Testing Consortium 


The X Testing Consortium is a loosely bound group of approximately a dozen companies 
working together on comprehensive test software for the X protocol and the Xlib C language 
interface to it. Bob Scheifler has been meeting with this group over the past year, providing 
some guidance. The output of the group will be both informal test specifications, and 
code implementing those specifications. The group also did some work on performance 
benchmarks, and Jim Fulton provided review and feedback on that work. Producing a 
complete test suite proved to require considerably more effort than expected, and various 
companies are now pulling resources off the project to work on other tasks. The group 
will be producing a final but incomplete release soon. Bob Scheifler is working on plans to 
continue the effort within the X Consortium, and to expand the testing effort to cover other 
components of the X Window System. 


15.2.23 Formal Standards 


The X3H3.6 Window System Task Group, under the X3H3 Computer Graphics Standards 
Committee, under ANSI, has been working on formal standardization of the X protocol for 
about two years now. Bob Scheifler has been attending the X3H3.6 meetings and participat- 
ing in their deliberations. The protocol specification has recently gone out for letter ballot 
within X3H3, but an ANSI standard is still perhaps fourteen months away. The task group 
had planned to start work on Xlib by now, but lack of resources have made that impractical. 
Both the task group and the X Consortium have recently approved a working relationship, 
in which X3H3.6 will ask the X Consortium to develop resolutions to technical issues as they 
arise during the remainder of the ANSI process. 


The [EEE Technical Committee on Operating Systems formed a new working group, P1201.1, 
to formally standardize (under ANSI) toolkit functionality and behavior in the X environ- 
ment. Bob Scheifler and Chris Peterson have to date shared responsibilities for interacting 
with this group. Their immediate goal appears to be standardization of a widget set based 
on the Xt Intrinsics, but to do so requires that the Intrinsics and Xlib be standardized. The 
group does not currently have the resources to do this, and is coordinating with X3H3.6 in 
an attempt to develop a workable plan. 


The National Institute of Standards and Technology issued a proposed Federal Information 
Processing Standard for the X Window System, composed of the standard specifications in 
Release 3 of the X Consortium software distribution: the X protocol, the Xlib C binding, the 
Xt Intrinsics, and the Bitmap Distribution Format for fonts. The fact that NIST issued this 
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FIPS without waiting for formal standardization by ANSI caused considerable consternation 
in the formal standards world, and the fact that the FIPS appears to make the Intrinsics an 
exclusive federal standard (the Intrinsics are a non-exclusive standard in the X Consortium) 
caused considerable consternation on the part of certain X vendors. Bob Scheifler has been 
working with all of these players to ensure that technical arguments are correct, and to try 
to keep the political arguments in perspective. 


15.2.24 Registration 


We established a mechanism to allow the X community to register the following significant 
items: organization names (used as prefixes for other names), keysyms, authorization pro- 
tocol names, vendor server string formats, protocol extension names, host address family 
formats, window property names and types, selection names and targets, window manager 
protocols, font foundry names, font property names, resource types, and application classes. 
The primary goal of registration is to avoid conflicting use of a given name or value. The 
secondary goal is to encourage use of these items by more than one organization. 


15.2.25 Graphical Programming Environment for Configuration 


Geeta Khare explored the specification, design, and implementation of a graphical pro- 
gramming language to address a class of problems known as configuration problems, those 
concerned with the correctness and completeness of a collection of items and/or the ar- 
rangement of those items under a set of constraints. Domain specificity of the language 
is achieved through the types and operations provided. A configuration task grammar was 
produced, which lists the breakdown of configuration tasks into subtasks and primitives, and 
a categorization was produced of objects found in configuration problems, allowing reuse of 
primitives in different configuration problems. A prototype was implemented in Common 
Lisp using KnowledgeCraft. 
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